Guide

Advanced Data Transformation And CleaningDeep dive

Create a Pivot Table from Excel with Pandas

Create a pivot table from an Excel file with pandas: read the workbook, aggregate with pivot_table using index, columns, values and aggfunc, then export the result.

To create a pivot table from an Excel file with pandas, read the workbook with pd.read_excel(), summarize it with pd.pivot_table(), and write the result back with to_excel(). The recipe below is fully runnable: the first block writes a sample workbook so the read has something to open.

Create a sample workbook

Python
import pandas as pd

df_seed = pd.DataFrame({
    "Region": ["North", "North", "South", "South", "West", "West"],
    "Sales_Rep": ["Ana", "Ben", "Cara", "Dan", "Eve", "Finn"],
    "Month": ["Jan", "Feb", "Jan", "Feb", "Jan", "Feb"],
    "Revenue": [12000, 9000, 8000, 7500, 6000, 5500],
    "Units": [120, 90, 80, 75, 60, 55],
})
df_seed.to_excel("source_data.xlsx", sheet_name="Q1", index=False)

Build and export the pivot

Python
# 1. Load the workbook
df = pd.read_excel("source_data.xlsx", engine="openpyxl")

# 2. Build the pivot table
pivot = pd.pivot_table(
    df,
    values=["Revenue", "Units"],
    index=["Region", "Sales_Rep"],
    columns="Month",
    aggfunc={"Revenue": "sum", "Units": "mean"},
    fill_value=0,
    margins=True,
    margins_name="Grand Total",
)

# 3. Export
pivot.to_excel("report_pivot.xlsx", sheet_name="Q1_Summary")
print(pivot)
  • values — the metric columns to aggregate.
  • index — the column(s) that become the pivot's rows.
  • columns — the column whose values spread across the pivot's columns.
  • aggfunc — how to combine rows that fall in the same cell. A dict applies a different function per metric.
  • fill_value — what to put in cells with no matching rows (here 0 instead of NaN).
  • margins=True — adds a totals row and column labeled by margins_name.

Map Excel pivot fields to pandas

Excel pivot UIpandas argument
Rowsindex
Columnscolumns
Valuesvalues
Summarize Values Byaggfunc
Grand Totalsmargins=True
Empty cell replacementfill_value
Report Filterdf.query(...) / df.loc[...] before pivoting

Filter before you pivot

There is no separate "filter" argument — slice the DataFrame first, then pivot the subset:

Python
north_south = df[df["Region"].isin(["North", "South"])]
filtered_pivot = pd.pivot_table(
    north_south, values="Revenue", index="Region",
    columns="Month", aggfunc="sum", fill_value=0,
)
print(filtered_pivot)

Apply Excel number formatting on export

to_excel() writes raw values without cell formats. Use the xlsxwriter engine to format a currency column in the output file:

Python
flat = filtered_pivot.reset_index()
with pd.ExcelWriter("formatted_pivot.xlsx", engine="xlsxwriter") as writer:
    flat.to_excel(writer, sheet_name="Report", index=False)
    workbook = writer.book
    worksheet = writer.sheets["Report"]
    money_fmt = workbook.add_format({"num_format": "$#,##0.00"})
    # Columns B onward hold the revenue figures
    worksheet.set_column(1, len(flat.columns) - 1, 14, money_fmt)
print("Wrote formatted_pivot.xlsx")

Troubleshooting

ValueError: Index contains duplicate entries, cannot reshape — this comes from DataFrame.pivot(), not pivot_table(). pivot() cannot collapse duplicate index/column pairs; pivot_table() can, because it aggregates. Use pivot_table with an explicit aggfunc, or deduplicate first: df = df.drop_duplicates(subset=["Region", "Month"]).

ModuleNotFoundError: No module named 'openpyxl' — pandas needs an engine for .xlsx. Install it: pip install openpyxl xlsxwriter.

Header lookups fail (KeyError) — Excel exports often add trailing whitespace. Normalize headers before pivoting:

Python
df.columns = df.columns.str.strip().str.replace(r"\s+", "_", regex=True)
print(df.columns.tolist())

Categorical columns show empty combinations — when index/columns is a categorical dtype, pandas 3.0 defaults to observed=True, showing only combinations present in the data. Pass observed=False to include every category level.

Where to go next