Guide
Merge Two Excel Files on a Common Column in Python
Merge two Excel files on a shared column in Python with pandas: load both workbooks, normalize the key, join with pd.merge, handle different names and duplicate keys, export.
To merge two Excel files on a shared column in Python, read each workbook into a DataFrame with pd.read_excel(), then join them with pd.merge() on the common key. The recipe below is fully runnable: the first block writes both source files so the reads have something to open.
For the wider set of join patterns, see Merging and Joining Excel DataFrames.
Create the two sample workbooks
import pandas as pd
sales = pd.DataFrame({
"product_sku": ["A-100", "B-200", "C-300", "A-100"],
"region": ["North", "South", "West", "East"],
"units": [10, 5, 8, 3],
})
sales.to_excel("sales_Q3.xlsx", index=False)
catalog = pd.DataFrame({
"product_sku": ["A-100", "B-200", "C-300"],
"product_name": ["Widget", "Gadget", "Gizmo"],
"unit_price": [19.99, 49.50, 8.75],
})
catalog.to_excel("product_catalog.xlsx", index=False)
Merge on the common column
Load both files, then join. A left join keeps every row of the primary table and attaches matching catalog columns; suffixes disambiguates any non-key columns that share a name:
df_primary = pd.read_excel("sales_Q3.xlsx", engine="openpyxl")
df_lookup = pd.read_excel("product_catalog.xlsx", engine="openpyxl")
merged = pd.merge(
df_primary,
df_lookup,
on="product_sku",
how="left",
suffixes=("_sales", "_catalog"),
)
merged.to_excel("merged_sales_report.xlsx", index=False, engine="openpyxl")
print(merged)
Normalize the key before joining
merge matches keys exactly, so casing, trailing whitespace, or a dtype difference (object vs int64) silently produce empty matches. Normalize both sides first:
for d in (df_primary, df_lookup):
d["product_sku"] = d["product_sku"].astype(str).str.strip().str.upper()
print(df_primary["product_sku"].tolist())
Choosing the join type
how | Keeps |
|---|---|
inner | Only keys present in both files |
left | All rows from the first file |
right | All rows from the second file |
outer | All keys from either file |
When the column names differ
If the key has a different name in each file, use left_on/right_on, then drop the redundant column:
alt_lookup = df_lookup.rename(columns={"product_sku": "ProductCode"})
merged_alt = pd.merge(
df_primary, alt_lookup,
left_on="product_sku", right_on="ProductCode", how="inner",
).drop(columns=["ProductCode"])
print(merged_alt.columns.tolist())
When duplicate keys multiply rows
If the lookup has more than one row per key, a join repeats the primary rows. Deduplicate the lookup before merging, or pass validate="m:1" to fail loudly if it isn't unique:
df_lookup_unique = df_lookup.drop_duplicates(subset=["product_sku"], keep="last")
safe = pd.merge(df_primary, df_lookup_unique, on="product_sku",
how="left", validate="m:1")
print(f"{len(safe)} rows (primary has {len(df_primary)})")
Validate the merge
After joining, confirm you didn't lose or gain rows unexpectedly and report unmatched keys. With a left join the row count should not drop below the primary's:
assert len(merged) >= len(df_primary), "Unexpected row loss during merge"
missing = merged["unit_price"].isna().sum()
print(f"{missing} unmatched keys" if missing else "All keys matched")
Where to go next
- For more join patterns — schema reconciliation, match auditing, cardinality validation — see Merging and Joining Excel DataFrames.
- An outer or right join can introduce
NaN; resolve it with Handling Missing Data in Excel Reports. - Summarize the merged result with Creating Pivot Tables from Excel Data.