Guide
Creating Pivot Tables from Excel Data with Pandas
Build pivot tables from Excel data in Python: ingest and clean a workbook, aggregate with pandas pivot_table, add totals, filter, and export a formatted report.
A pivot table summarizes raw rows into a grid: one dimension down the rows, another across the columns, and an aggregated metric in the cells. pandas.pivot_table() does the same job as Excel's pivot engine, but in code you can version, schedule, and rerun. This page walks the full pipeline — ingest, clean, aggregate, filter, export — with runnable snippets. Each block builds on the previous one and shares the same namespace, so you can paste them in order.
For the upstream cleaning steps these examples assume, see Advanced Data Transformation and Cleaning.
Install the dependencies
pandas delegates Excel I/O to an engine. openpyxl reads and writes .xlsx; xlsxwriter is an alternative writer with rich cell formatting.
pip install pandas openpyxl xlsxwriter
Create a sample workbook
So every example below runs as-is, generate a small sales workbook. The headers deliberately include mixed casing and a stray space to mimic a real export:
import pandas as pd
sales = pd.DataFrame({
"Region ": ["North", "North", "South", "South", "West", "West", "North", "South"],
"Quarter": ["Q1", "Q2", "Q1", "Q2", "Q1", "Q2", "Q1", "Q2"],
"Product_Category": ["Hardware", "Software", "Hardware", "Software",
"Hardware", "Software", "Software", "Hardware"],
"Revenue": [12000, 8500, 9800, 7200, 6400, 5100, 4300, 11200],
"Units_Sold": [120, 40, 95, 36, 60, 22, 18, 110],
})
sales.to_excel("sales_data.xlsx", sheet_name="Raw_Data", index=False)
Step 1: Ingest and clean the source
Excel exports arrive with trailing whitespace in headers and the occasional fully blank row. Normalize the column names and drop empty rows and columns before aggregating, so a header like "Region " doesn't break later lookups:
df = pd.read_excel("sales_data.xlsx", sheet_name="Raw_Data", engine="openpyxl")
# Standardize headers: strip, lowercase, underscore-separate
df.columns = df.columns.str.strip().str.lower().str.replace(" ", "_")
# Drop rows/columns that are entirely empty
df = df.dropna(how="all").dropna(axis=1, how="all")
# Make sure metrics are numeric (coerce stray text to NaN)
for col in ["revenue", "units_sold"]:
df[col] = pd.to_numeric(df[col], errors="coerce")
print(df.columns.tolist())
For deeper cleaning — currency symbols, mixed types, duplicate handling — see Cleaning Excel Data with Pandas.
Step 2: Build the pivot table
pivot_table takes the clean DataFrame, the metrics to summarize (values), the row grouping (index), the column grouping (columns), and how to aggregate (aggfunc). Here revenue is summed and units are averaged:
pivot = pd.pivot_table(
df,
values=["revenue", "units_sold"],
index=["region", "quarter"],
columns="product_category",
aggfunc={"revenue": "sum", "units_sold": "mean"},
fill_value=0,
)
print(pivot)
The result has a hierarchical row index (region then quarter) and a column layout split by product_category. fill_value=0 replaces the NaN cells that appear where a region/quarter has no rows in a category.
Step 3: Add grand totals
margins=True appends a totals row and column; margins_name labels them. The totals respect each metric's own aggfunc:
pivot_totals = pd.pivot_table(
df,
values="revenue",
index="region",
columns="product_category",
aggfunc="sum",
fill_value=0,
margins=True,
margins_name="Total",
)
print(pivot_totals)
Step 4: Filter the result
A pivot is just a DataFrame, so you can slice it. To keep only regions whose total revenue clears a threshold, sum across the category columns and mask the rows:
revenue_by_region = pd.pivot_table(
df, values="revenue", index="region",
columns="product_category", aggfunc="sum", fill_value=0,
)
row_totals = revenue_by_region.sum(axis=1)
top_regions = revenue_by_region.loc[row_totals > 15000]
print(top_regions)
Step 5: Export a formatted report
Flatten the multi-level columns to plain strings so Excel shows readable headers, then write with xlsxwriter and style the header row:
export = pivot.copy()
export.columns = ["_".join(map(str, col)).strip() for col in export.columns]
export = export.reset_index()
with pd.ExcelWriter("pivot_report.xlsx", engine="xlsxwriter") as writer:
export.to_excel(writer, sheet_name="Pivot_Report", index=False)
workbook = writer.book
worksheet = writer.sheets["Pivot_Report"]
header_fmt = workbook.add_format({
"bold": True, "bg_color": "#4472C4", "font_color": "white", "border": 1,
})
for col_idx, col_name in enumerate(export.columns):
worksheet.write(0, col_idx, col_name, header_fmt)
worksheet.autofit()
print("Wrote pivot_report.xlsx")
Common errors and fixes
| Error | Cause | Fix |
|---|---|---|
KeyError: 'region' | Header casing/whitespace mismatch | Normalize headers with .str.strip().str.lower(); check df.columns.tolist(). |
DataError: No numeric types to aggregate | Metric column is stored as text | Coerce first: df[col] = pd.to_numeric(df[col], errors="coerce"). |
ValueError: Grouper ... not 1-dimensional | Same column passed to both index and columns | Use distinct columns for each axis. |
| Columns export as tuples | to_excel() renders the MultiIndex literally | Flatten with ["_".join(map(str, c)) for c in pivot.columns] before writing. |
When rows share the same index/columns combination, pivot_table aggregates them with aggfunc rather than erroring — that is the difference from DataFrame.pivot(), which raises ValueError on duplicate entries. Always set aggfunc explicitly so the rollup is intentional.
A note on observed
When index or columns is a categorical dtype, the observed parameter controls whether unused category combinations appear as empty cells. In pandas 3.0 the default is observed=True, so only combinations present in the data are shown. Pass observed=False if you want every category level represented even when it has no rows.
Conclusion
pd.pivot_table() is a single function that replaces the manual click-through of Excel's PivotTable wizard: normalize headers once, call it with the right index, columns, values, and aggfunc, add margins=True for totals, and export. The real power is repeatability — the same script runs against next month's export without opening a spreadsheet.
Where to go next
- For the focused, single-function recipe, see Create Pivot Table from Excel with Pandas.
- To enrich transactions with lookup tables before pivoting, see Merging and Joining Excel DataFrames.
- To handle gaps before aggregating, see Handling Missing Data in Excel Reports.