Guide

Advanced Data Transformation And CleaningDeep dive

Create a Pivot Table from Excel with Pandas

Q: When should I use pivot_table() instead of pivot()?

Use pivot_table() whenever a row/column combination can repeat, because it aggregates duplicates via aggfunc. pivot() only reshapes and raises ValueError: Index contains duplicate entries if any index/column pair occurs more than once.

Q: Can I apply a different aggregation to each value column?

Yes. Pass aggfunc a dictionary keyed by column, like aggfunc={"Revenue": "sum", "Units": "mean"}. Each metric is then summarized with its own function in a single call.

Q: Why are my totals rows and columns missing?

You need margins=True; without it pandas omits the grand totals entirely. Set margins_name to control the label that marks the totals row and column.

Q: Why does my output have no cell formatting?

to_excel() writes raw values only. Use pd.ExcelWriter with engine="xlsxwriter", then apply a format via worksheet.set_column() with a num_format to style currency or other columns.

Create a pivot table from an Excel file with pandas: read the workbook, aggregate with pivot_table using index, columns, values and aggfunc, then export the result.

To create a pivot table from an Excel file with pandas, read the workbook with pd.read_excel(), summarize it with pd.pivot_table(), and write the result back with to_excel(). This is the code-first equivalent of dropping fields into the Rows, Columns and Values wells of an Excel PivotTable — but it runs unattended, so the same rollup can regenerate every time the source data changes. It is one recipe in the broader workflow of building pivot tables from Excel data, and the code below is fully runnable: the first block writes a sample workbook so the read has something to open.

Prerequisites

Python 3.9+ with pandas installed (pip install pandas).
An Excel engine. pandas reads .xlsx through openpyxl and can write formatted output through xlsxwriter. Install both up front: pip install openpyxl xlsxwriter.
A source workbook with a header row and at least one categorical column to group by plus one numeric column to aggregate. The first code block below generates one, so you can run everything without your own file.
Familiarity with a DataFrame — how to read one, select columns, and slice rows. If your source needs tidying first, run it through cleaning Excel data with pandas before you pivot.

Step 1 — Create a sample workbook

Python

import pandas as pd

df_seed = pd.DataFrame({
    "Region": ["North", "North", "South", "South", "West", "West"],
    "Sales_Rep": ["Ana", "Ben", "Cara", "Dan", "Eve", "Finn"],
    "Month": ["Jan", "Feb", "Jan", "Feb", "Jan", "Feb"],
    "Revenue": [12000, 9000, 8000, 7500, 6000, 5500],
    "Units": [120, 90, 80, 75, 60, 55],
})
df_seed.to_excel("source_data.xlsx", sheet_name="Q1", index=False)

Step 2 — Build and export the pivot

Python

# 1. Load the workbook
df = pd.read_excel("source_data.xlsx", engine="openpyxl")

# 2. Build the pivot table
pivot = pd.pivot_table(
    df,
    values=["Revenue", "Units"],
    index=["Region", "Sales_Rep"],
    columns="Month",
    aggfunc={"Revenue": "sum", "Units": "mean"},
    fill_value=0,
    margins=True,
    margins_name="Grand Total",
)

# 3. Export
pivot.to_excel("report_pivot.xlsx", sheet_name="Q1_Summary")
print(pivot)

Each argument maps to one decision you would otherwise make in the PivotTable field list:

values — the metric columns to aggregate.
index — the column(s) that become the pivot's rows.
columns — the column whose values spread across the pivot's columns.
aggfunc — how to combine rows that fall in the same cell. A dict applies a different function per metric.
fill_value — what to put in cells with no matching rows (here 0 instead of NaN).
margins=True — adds a totals row and column labeled by margins_name.

Map Excel pivot fields to pandas

If you already build PivotTables in the Excel UI, this table is the whole translation:

Excel pivot UI	pandas argument
Rows	`index`
Columns	`columns`
Values	`values`
Summarize Values By	`aggfunc`
Grand Totals	`margins=True`
Empty cell replacement	`fill_value`
Report Filter	`df.query(...)` / `df.loc[...]` before pivoting

Step 3 — Filter before you pivot

There is no separate "filter" argument — the Excel Report Filter has no direct equivalent. Slice the DataFrame first, then pivot the subset:

Python

north_south = df[df["Region"].isin(["North", "South"])]
filtered_pivot = pd.pivot_table(
    north_south, values="Revenue", index="Region",
    columns="Month", aggfunc="sum", fill_value=0,
)
print(filtered_pivot)

Doing the filter in pandas keeps the reduction explicit and testable, and it means large source rows never enter the aggregation. If the data you want to pivot lives across several workbooks, combine them with merging and joining Excel DataFrames first, then pivot the single merged frame.

Step 4 — Apply Excel number formatting on export

to_excel() writes raw values without cell formats. Use the xlsxwriter engine to format a currency column in the output file:

Python

flat = filtered_pivot.reset_index()
with pd.ExcelWriter("formatted_pivot.xlsx", engine="xlsxwriter") as writer:
    flat.to_excel(writer, sheet_name="Report", index=False)
    workbook = writer.book
    worksheet = writer.sheets["Report"]
    money_fmt = workbook.add_format({"num_format": "$#,##0.00"})
    # Columns B onward hold the revenue figures
    worksheet.set_column(1, len(flat.columns) - 1, 14, money_fmt)
print("Wrote formatted_pivot.xlsx")

This is enough for a single currency column. When a report needs a styled header row, frozen panes, and a highlighted totals line, follow the dedicated recipe for exporting a pandas pivot table to Excel, formatted.

Common pitfalls & gotchas

ValueError: Index contains duplicate entries, cannot reshape — this comes from DataFrame.pivot(), not pivot_table(). pivot() cannot collapse duplicate index/column pairs; pivot_table() can, because it aggregates. Use pivot_table with an explicit aggfunc, or deduplicate first: df = df.drop_duplicates(subset=["Region", "Month"]).

ModuleNotFoundError: No module named 'openpyxl' — pandas needs an engine for .xlsx. Install it: pip install openpyxl xlsxwriter.

Header lookups fail (KeyError) — Excel exports often add trailing whitespace or stray casing, so values="Revenue" misses a column actually named "Revenue ". Normalize headers before pivoting:

Python

df.columns = df.columns.str.strip().str.replace(r"\s+", "_", regex=True)
print(df.columns.tolist())

Categorical columns show empty combinations — when index/columns is a categorical dtype, pandas 3.0 defaults to observed=True, showing only combinations present in the data. Pass observed=False to include every category level.

Totals are silently absent — a pivot without margins=True looks complete but has no grand total. If stakeholders expect a bottom-line row, set margins=True explicitly; it is off by default.

Performance and scale notes

Read once, pivot many. read_excel is the slow part — parsing the XML inside an .xlsx is far more expensive than the aggregation. Load the workbook into a DataFrame a single time and build every pivot from that in-memory frame rather than re-reading the file per report.
Only keep the columns you group or aggregate on. Pass usecols=["Region", "Sales_Rep", "Month", "Revenue", "Units"] to read_excel so unused columns never enter memory. On wide exports this can cut read time and footprint substantially.
Downcast keys to category. Converting repeated string columns with df["Region"] = df["Region"].astype("category") shrinks memory and speeds up the group-by inside pivot_table on large frames — just remember the observed behaviour above.
pivot_table aggregates in one C-level pass, so the cost scales with the number of source rows, not with the size of the output grid. Filtering rows out (Step 3) before aggregating is the most effective way to speed up a heavy pivot.
Very large sources belong in chunks or a database. If a workbook is big enough that read_excel strains memory, aggregate a coarser summary per file or move the data into SQLite/DuckDB and let it do the group-by, then pivot the small result.

Choose aggfunc deliberately

pivot_table defaults to mean, which is rarely what a revenue report wants and produces a plausible, wrong number rather than an error. Naming the function explicitly — aggfunc="sum" for money, "nunique" for distinct orders, "count" for line items — makes the report's meaning visible in the code. A pivot whose aggregation is implicit is one that will eventually be misread by whoever maintains it next.

Watch the fill value

fill_value=0 makes a pivot tidy and can misrepresent it: zero means "no sales" and blank means "no data", and a pivot that shows zeros for months that were never reported invites a wrong conclusion. Choose per report, and say which you chose.

Fail where the cause is

The most useful place for a check is as close as possible to the thing that can go wrong: the sheet name at the read, the column list before the transform, the row count before the write, the file size before delivery. Each of those turns a confusing downstream error into a message naming the actual problem. Checks placed late still catch the failure, but they describe a symptom — and a symptom three stages from its cause is what makes a simple mistake take an afternoon.

index, columns and values

Conclusion

Three functions carry the whole task: read_excel loads the source, pivot_table does the aggregation with index, columns, values and aggfunc, and to_excel (optionally through ExcelWriter) writes the result. Filter rows before you pivot, normalize headers so lookups do not fail, and set margins=True when a totals line is expected. Because it is plain code, the same recipe reruns on next month's export without a single manual click.

Frequently asked questions

When should I use pivot_table() instead of pivot()? Use pivot_table() whenever a row/column combination can repeat, because it aggregates duplicates via aggfunc. pivot() only reshapes and raises ValueError: Index contains duplicate entries if any index/column pair occurs more than once.

Can I apply a different aggregation to each value column? Yes. Pass aggfunc a dictionary keyed by column, like aggfunc={"Revenue": "sum", "Units": "mean"}. Each metric is then summarized with its own function in a single call.

Why are my totals rows and columns missing? You need margins=True; without it pandas omits the grand totals entirely. Set margins_name to control the label that marks the totals row and column.

How do I filter the source like an Excel Report Filter? There is no filter argument — slice the DataFrame before pivoting, e.g. df[df["Region"].isin(["North", "South"])], then pass that subset to pivot_table().

Why does my output have no cell formatting?to_excel() writes raw values only. Use pd.ExcelWriter with engine="xlsxwriter", then apply a format via worksheet.set_column() with a num_format to style currency or other columns.

Up to the parent workflow: Creating Pivot Tables from Excel Data — the full ingest, clean, filter and export pipeline.
Export a Pandas Pivot Table to Excel (Formatted) — styled headers, number formats and a highlighted totals row.
Merging and Joining Excel DataFrames — combine multiple sources before you aggregate.
Cleaning Excel Data with Pandas — tidy headers and values so the pivot's lookups never fail.
Back to the toolkit: Advanced Data Transformation and Cleaning.