Guide

DocumentationDeep dive

Getting Started with Python Excel Automation

Learn how to read, transform, and write Excel files with Python using pandas and openpyxl — the libraries, the pipeline pattern, and runnable starter code.

If you build the same spreadsheet by hand every week, Python can do it for you in seconds — and produce the exact same result every time. This guide is the on-ramp: which library to reach for, how a small automation pipeline fits together, and a complete read - transform - write example you can paste and run today. Every code block below executes in order against a sample workbook created on this page, so you can follow along without any data of your own.

Which library should you use?

Python has no single "Excel library." Three tools cover almost every task, and they overlap by design — most real scripts combine pandas with one of the others.

LibraryBest forNeeds Excel installed?Runs on Linux servers?
pandasReading, transforming, and writing tabular dataNoYes
openpyxlCell-level control: styling, formulas, charts, editing existing filesNoYes
xlwingsDriving a live Excel app: running macros, refreshing dataYesNo (Windows/macOS)

A useful rule of thumb:

For unattended jobs on a Linux server or CI runner, stick to pandas and openpyxl: they are pure Python and never launch Excel.

Install the libraries

pandas does not bundle an Excel reader/writer; install an engine alongside it. openpyxl covers .xlsx reading and writing and is all you need to follow this guide.

Bash
pip install pandas openpyxl

The read - transform - write pattern

Nearly every Excel automation script is the same three steps in a row:

  1. Read a source workbook (or CSV, or database) into a pandas DataFrame.
  2. Transform it — filter, compute new columns, aggregate.
  3. Write the result back out to a formatted .xlsx file.

The rest of this page walks that pattern end to end. First, create a sample workbook so the later steps have something to read:

Python
import pandas as pd

orders = pd.DataFrame({
    "Order_ID": [1001, 1002, 1003, 1004, 1005, 1006],
    "Region": ["North", "South", "North", "West", "South", "West"],
    "Quantity": [3, 1, 5, 2, 4, 1],
    "Unit_Price": [19.99, 49.50, 19.99, 8.75, 12.00, 49.50],
})
orders.to_excel("orders.xlsx", sheet_name="Orders", index=False)
print("Wrote orders.xlsx with", len(orders), "rows")

Step 1: Read the source data

read_excel loads a sheet into a DataFrame. With no extra arguments it reads the first sheet and treats row 0 as the header:

Python
df = pd.read_excel("orders.xlsx", sheet_name="Orders")
print(df.head())

Step 2: Transform it

Compute a revenue column, then aggregate to one row per region. These are vectorized operations — no Python loops over rows:

Python
df["Revenue"] = df["Quantity"] * df["Unit_Price"]

summary = (
    df.groupby("Region", as_index=False)
      .agg(Orders=("Order_ID", "count"),
           Units=("Quantity", "sum"),
           Revenue=("Revenue", "sum"))
      .sort_values("Revenue", ascending=False)
)
print(summary)

Step 3: Write the report

Use an ExcelWriter context manager to put both the detail and the summary into one workbook, each on its own sheet. The context manager saves and closes the file for you:

Python
with pd.ExcelWriter("regional_report.xlsx", engine="openpyxl") as writer:
    summary.to_excel(writer, sheet_name="Summary", index=False)
    df.to_excel(writer, sheet_name="Detail", index=False)

print("Saved regional_report.xlsx")

That index=False is worth a habit: without it, pandas writes the DataFrame's row numbers as a leading column, which rarely belongs in a report. See Write Pandas DataFrame to Excel Without Index for the details.

Add formatting with openpyxl

pandas writes plain data. When you want a styled header or fitted column widths, reopen the file with openpyxl and edit cells directly. Here we bold the header row of the summary sheet and set sensible widths:

Python
from openpyxl import load_workbook
from openpyxl.styles import Font, PatternFill

wb = load_workbook("regional_report.xlsx")
ws = wb["Summary"]

header_font = Font(bold=True, color="FFFFFF")
header_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
for cell in ws[1]:
    cell.font = header_font
    cell.fill = header_fill

# Fit each column to its widest value
for column_cells in ws.columns:
    width = max(len(str(c.value)) for c in column_cells if c.value is not None)
    ws.column_dimensions[column_cells[0].column_letter].width = width + 2

wb.save("regional_report.xlsx")
print("Formatted Summary sheet")

Running scripts unattended

Once a script works by hand, the payoff is running it on a schedule. A few habits keep scheduled jobs reliable:

  • Use absolute paths. A scheduler's working directory is rarely the script's folder, so "orders.xlsx" may not resolve. Build paths from a known base instead:
Python
from pathlib import Path

base_dir = Path.cwd()  # in a real job, point this at your data folder
source = base_dir / "orders.xlsx"
print("Will read:", source)
  • Fail loudly. Check that the source file exists before reading, so a missing input produces a clear message rather than a confusing traceback.
  • Pin your dependencies. Record pandas and openpyxl versions in a requirements.txt so the job behaves the same after a redeploy.

When you are ready to put a script on a timer, Scheduling Python Excel Scripts with Cron walks through the setup.

Common first-time errors

  • ModuleNotFoundError: No module named 'openpyxl' — pandas found no Excel engine. Install one with pip install openpyxl.
  • ValueError: Excel file format cannot be determined — the file is not really an .xlsx (often a renamed CSV or HTML export). Confirm the real format, or pass engine="openpyxl" explicitly.
  • PermissionError: [Errno 13] Permission denied on write — the target file is open in Excel. Close it, or write to a new filename.
  • A leading unnamed column in your output — you forgot index=False on to_excel.

Conclusion

The read - transform - write pattern is a one-time investment: once the sample script runs, you have a template for every report you will ever build. Add absolute paths, log row counts at ingestion and output, and pin your package versions — those three habits are the difference between a script that works once and a scheduled job you can trust.

Where to go next

You now have the full read - transform - write loop running. Build on it with these guides: