Guide
How to Create Pivot Table from Excel with Pandas
To create a pivot table from Excel with Pandas, load the workbook with pd.read_excel(), aggregate the dataset via pd.pivot_table(), and export the result using to_excel(). This programmatic workflow replaces manual UI steps, enabling deterministic, version-controlled reporting pipelines.
How to Create Pivot Table from Excel with Pandas
To create a pivot table from Excel with Pandas, load the workbook with pd.read_excel(), aggregate the dataset via pd.pivot_table(), and export the result using to_excel(). This programmatic workflow replaces manual UI steps, enabling deterministic, version-controlled reporting pipelines.
Core Implementation
import pandas as pd
# 1. Load workbook
df = pd.read_excel("source_data.xlsx", engine="openpyxl")
# 2. Build pivot table
pivot = pd.pivot_table(
df,
values=["Revenue", "Units"],
index=["Region", "Sales_Rep"],
columns="Month",
aggfunc={"Revenue": "sum", "Units": "mean"},
fill_value=0,
margins=True,
margins_name="Grand Total",
observed=False # Retains unused categorical levels (Pandas 2.2+ default is True)
)
# 3. Export
pivot.to_excel("report_pivot.xlsx", sheet_name="Q3_Summary")
Quick Reference: Parameter Mapping
| Excel UI Feature | Pandas Equivalent |
|---|---|
| Rows | index |
| Columns | columns |
| Values | values |
| Summarize by | aggfunc |
| Show Grand Totals | margins=True |
| Replace Blanks | fill_value |
| Filter Context | df.query() / df.loc[] (apply pre-pivot) |
Troubleshooting & Fallbacks
ValueError: Index contains duplicate entries, cannot reshapeCause: Missing aggfunc or overlapping index/columns combinations.
Fix: Always specify aggfunc. For multi-metric outputs, pass a list or dict. If duplicates indicate dirty data, clean upstream: df = df.drop_duplicates(subset=["Region", "Month"]).
ModuleNotFoundError: No module named 'openpyxl'Cause: Pandas delegates .xlsx I/O to external engines.
Fix: Install explicitly: pip install openpyxl xlsxwriter. For read-heavy pipelines, use engine="calamine" with fastexcel for 3–5x faster parsing.
MemoryError on workbooks >500MBCause: Loading entire sheets into RAM.
Fix: Restrict footprint with usecols in read_excel(), or convert high-cardinality strings to categorical dtype: df["Region"] = df["Region"].astype("category"). For extreme scale, process in chunks with polars or dask.
Lost Excel Formatting in OutputCause: Pandas writes raw values, stripping cell formats.
Fix: Use xlsxwriter to apply formats programmatically:
with pd.ExcelWriter("formatted_pivot.xlsx", engine="xlsxwriter") as writer:
pivot.to_excel(writer, sheet_name="Report")
workbook = writer.book
worksheet = writer.sheets["Report"]
money_fmt = workbook.add_format({"num_format": "$#,##0.00"})
worksheet.set_column("B:Z", 14, money_fmt)
Automation Best Practices
Scripted pivots require strict schema validation. Excel exports frequently inject trailing whitespace or hidden characters that break index lookups. Normalize headers before execution:
df.columns = df.columns.str.strip().str.replace(r"\s+", "_", regex=True)
Pandas does not preserve cell-level formulas. If downstream consumers require live Excel calculations, export the pivot as a static table and append formula ranges using openpyxl post-write. This aligns with standard practices for Creating Pivot Tables from Excel Data, where deterministic outputs depend on clean input schemas.
When structuring broader data workflows, treat pivot generation as the final aggregation step. Raw ingestion, type coercion, and missing-value imputation must occur upstream. For complex transformations involving window functions, rolling aggregations, or cross-sheet joins, combine groupby() with transform() or merge() before calling pivot_table(). These techniques fall under Advanced Data Transformation and Cleaning and prevent silent data corruption during automated reporting cycles.
Wrap pd.pivot_table() in a try/except block during automation to capture malformed workbooks without halting batch jobs. Log exact aggfunc mismatches or missing columns to standard error for rapid debugging. This pattern ensures reporting pipelines remain resilient across varying Excel export formats and schema drift.