Guide

Advanced Data Transformation And CleaningDeep dive

Creating Pivot Tables from Excel Data with Pandas

Build pivot tables from Excel data in Python: ingest and clean a workbook, aggregate with pandas pivot_table, add totals, filter, and export a formatted report.

A pivot table summarizes raw rows into a grid: one dimension down the rows, another across the columns, and an aggregated metric in the cells. pandas.pivot_table() does the same job as Excel's pivot engine, but in code you can version, schedule, and rerun. This page walks the full pipeline — ingest, clean, aggregate, filter, export — with runnable snippets. Each block builds on the previous one and shares the same namespace, so you can paste them in order.

For the upstream cleaning steps these examples assume, see Advanced Data Transformation and Cleaning.

Install the dependencies

pandas delegates Excel I/O to an engine. openpyxl reads and writes .xlsx; xlsxwriter is an alternative writer with rich cell formatting.

Bash
pip install pandas openpyxl xlsxwriter

Create a sample workbook

So every example below runs as-is, generate a small sales workbook. The headers deliberately include mixed casing and a stray space to mimic a real export:

Python
import pandas as pd

sales = pd.DataFrame({
    "Region ": ["North", "North", "South", "South", "West", "West", "North", "South"],
    "Quarter": ["Q1", "Q2", "Q1", "Q2", "Q1", "Q2", "Q1", "Q2"],
    "Product_Category": ["Hardware", "Software", "Hardware", "Software",
                          "Hardware", "Software", "Software", "Hardware"],
    "Revenue": [12000, 8500, 9800, 7200, 6400, 5100, 4300, 11200],
    "Units_Sold": [120, 40, 95, 36, 60, 22, 18, 110],
})
sales.to_excel("sales_data.xlsx", sheet_name="Raw_Data", index=False)

Step 1: Ingest and clean the source

Excel exports arrive with trailing whitespace in headers and the occasional fully blank row. Normalize the column names and drop empty rows and columns before aggregating, so a header like "Region " doesn't break later lookups:

Python
df = pd.read_excel("sales_data.xlsx", sheet_name="Raw_Data", engine="openpyxl")

# Standardize headers: strip, lowercase, underscore-separate
df.columns = df.columns.str.strip().str.lower().str.replace(" ", "_")

# Drop rows/columns that are entirely empty
df = df.dropna(how="all").dropna(axis=1, how="all")

# Make sure metrics are numeric (coerce stray text to NaN)
for col in ["revenue", "units_sold"]:
    df[col] = pd.to_numeric(df[col], errors="coerce")

print(df.columns.tolist())

For deeper cleaning — currency symbols, mixed types, duplicate handling — see Cleaning Excel Data with Pandas.

Step 2: Build the pivot table

pivot_table takes the clean DataFrame, the metrics to summarize (values), the row grouping (index), the column grouping (columns), and how to aggregate (aggfunc). Here revenue is summed and units are averaged:

Python
pivot = pd.pivot_table(
    df,
    values=["revenue", "units_sold"],
    index=["region", "quarter"],
    columns="product_category",
    aggfunc={"revenue": "sum", "units_sold": "mean"},
    fill_value=0,
)
print(pivot)

The result has a hierarchical row index (region then quarter) and a column layout split by product_category. fill_value=0 replaces the NaN cells that appear where a region/quarter has no rows in a category.

Step 3: Add grand totals

margins=True appends a totals row and column; margins_name labels them. The totals respect each metric's own aggfunc:

Python
pivot_totals = pd.pivot_table(
    df,
    values="revenue",
    index="region",
    columns="product_category",
    aggfunc="sum",
    fill_value=0,
    margins=True,
    margins_name="Total",
)
print(pivot_totals)

Step 4: Filter the result

A pivot is just a DataFrame, so you can slice it. To keep only regions whose total revenue clears a threshold, sum across the category columns and mask the rows:

Python
revenue_by_region = pd.pivot_table(
    df, values="revenue", index="region",
    columns="product_category", aggfunc="sum", fill_value=0,
)
row_totals = revenue_by_region.sum(axis=1)
top_regions = revenue_by_region.loc[row_totals > 15000]
print(top_regions)

Step 5: Export a formatted report

Flatten the multi-level columns to plain strings so Excel shows readable headers, then write with xlsxwriter and style the header row:

Python
export = pivot.copy()
export.columns = ["_".join(map(str, col)).strip() for col in export.columns]
export = export.reset_index()

with pd.ExcelWriter("pivot_report.xlsx", engine="xlsxwriter") as writer:
    export.to_excel(writer, sheet_name="Pivot_Report", index=False)

    workbook = writer.book
    worksheet = writer.sheets["Pivot_Report"]
    header_fmt = workbook.add_format({
        "bold": True, "bg_color": "#4472C4", "font_color": "white", "border": 1,
    })
    for col_idx, col_name in enumerate(export.columns):
        worksheet.write(0, col_idx, col_name, header_fmt)
    worksheet.autofit()

print("Wrote pivot_report.xlsx")

Common errors and fixes

ErrorCauseFix
KeyError: 'region'Header casing/whitespace mismatchNormalize headers with .str.strip().str.lower(); check df.columns.tolist().
DataError: No numeric types to aggregateMetric column is stored as textCoerce first: df[col] = pd.to_numeric(df[col], errors="coerce").
ValueError: Grouper ... not 1-dimensionalSame column passed to both index and columnsUse distinct columns for each axis.
Columns export as tuplesto_excel() renders the MultiIndex literallyFlatten with ["_".join(map(str, c)) for c in pivot.columns] before writing.

When rows share the same index/columns combination, pivot_table aggregates them with aggfunc rather than erroring — that is the difference from DataFrame.pivot(), which raises ValueError on duplicate entries. Always set aggfunc explicitly so the rollup is intentional.

A note on observed

When index or columns is a categorical dtype, the observed parameter controls whether unused category combinations appear as empty cells. In pandas 3.0 the default is observed=True, so only combinations present in the data are shown. Pass observed=False if you want every category level represented even when it has no rows.

Conclusion

pd.pivot_table() is a single function that replaces the manual click-through of Excel's PivotTable wizard: normalize headers once, call it with the right index, columns, values, and aggfunc, add margins=True for totals, and export. The real power is repeatability — the same script runs against next month's export without opening a spreadsheet.

Where to go next