Guide
Getting Started with Python Excel Automation
Learn how to read, transform, and write Excel files with Python using pandas and openpyxl — the libraries, the pipeline pattern, and runnable starter code.
If you build the same spreadsheet by hand every week, Python can do it for you in seconds — and produce the exact same result every time. This guide is the on-ramp: which library to reach for, how a small automation pipeline fits together, and a complete read - transform - write example you can paste and run today. Every code block below executes in order against a sample workbook created on this page, so you can follow along without any data of your own.
Which library should you use?
Python has no single "Excel library." Three tools cover almost every task, and they overlap by design — most real scripts combine pandas with one of the others.
| Library | Best for | Needs Excel installed? | Runs on Linux servers? |
|---|---|---|---|
| pandas | Reading, transforming, and writing tabular data | No | Yes |
| openpyxl | Cell-level control: styling, formulas, charts, editing existing files | No | Yes |
| xlwings | Driving a live Excel app: running macros, refreshing data | Yes | No (Windows/macOS) |
A useful rule of thumb:
- Reach for pandas when you think in terms of rows, columns, and aggregations. It delegates the actual
.xlsxparsing to an engine — usuallyopenpyxl. See Reading Excel Files with Pandas and Writing DataFrames to Excel with Pandas. - Reach for openpyxl when you need to touch individual cells — apply formatting, write formulas, or append rows to a workbook that already exists. See Using openpyxl for Excel File Manipulation.
- Reach for xlwings only when a real, installed copy of Excel must stay in the loop — for example to run an existing VBA macro. See Automating Excel with xlwings Basics.
For unattended jobs on a Linux server or CI runner, stick to pandas and openpyxl: they are pure Python and never launch Excel.
Install the libraries
pandas does not bundle an Excel reader/writer; install an engine alongside it. openpyxl covers .xlsx reading and writing and is all you need to follow this guide.
pip install pandas openpyxl
The read - transform - write pattern
Nearly every Excel automation script is the same three steps in a row:
- Read a source workbook (or CSV, or database) into a pandas DataFrame.
- Transform it — filter, compute new columns, aggregate.
- Write the result back out to a formatted
.xlsxfile.
The rest of this page walks that pattern end to end. First, create a sample workbook so the later steps have something to read:
import pandas as pd
orders = pd.DataFrame({
"Order_ID": [1001, 1002, 1003, 1004, 1005, 1006],
"Region": ["North", "South", "North", "West", "South", "West"],
"Quantity": [3, 1, 5, 2, 4, 1],
"Unit_Price": [19.99, 49.50, 19.99, 8.75, 12.00, 49.50],
})
orders.to_excel("orders.xlsx", sheet_name="Orders", index=False)
print("Wrote orders.xlsx with", len(orders), "rows")
Step 1: Read the source data
read_excel loads a sheet into a DataFrame. With no extra arguments it reads the first sheet and treats row 0 as the header:
df = pd.read_excel("orders.xlsx", sheet_name="Orders")
print(df.head())
Step 2: Transform it
Compute a revenue column, then aggregate to one row per region. These are vectorized operations — no Python loops over rows:
df["Revenue"] = df["Quantity"] * df["Unit_Price"]
summary = (
df.groupby("Region", as_index=False)
.agg(Orders=("Order_ID", "count"),
Units=("Quantity", "sum"),
Revenue=("Revenue", "sum"))
.sort_values("Revenue", ascending=False)
)
print(summary)
Step 3: Write the report
Use an ExcelWriter context manager to put both the detail and the summary into one workbook, each on its own sheet. The context manager saves and closes the file for you:
with pd.ExcelWriter("regional_report.xlsx", engine="openpyxl") as writer:
summary.to_excel(writer, sheet_name="Summary", index=False)
df.to_excel(writer, sheet_name="Detail", index=False)
print("Saved regional_report.xlsx")
That index=False is worth a habit: without it, pandas writes the DataFrame's row numbers as a leading column, which rarely belongs in a report. See Write Pandas DataFrame to Excel Without Index for the details.
Add formatting with openpyxl
pandas writes plain data. When you want a styled header or fitted column widths, reopen the file with openpyxl and edit cells directly. Here we bold the header row of the summary sheet and set sensible widths:
from openpyxl import load_workbook
from openpyxl.styles import Font, PatternFill
wb = load_workbook("regional_report.xlsx")
ws = wb["Summary"]
header_font = Font(bold=True, color="FFFFFF")
header_fill = PatternFill(start_color="2F5496", end_color="2F5496", fill_type="solid")
for cell in ws[1]:
cell.font = header_font
cell.fill = header_fill
# Fit each column to its widest value
for column_cells in ws.columns:
width = max(len(str(c.value)) for c in column_cells if c.value is not None)
ws.column_dimensions[column_cells[0].column_letter].width = width + 2
wb.save("regional_report.xlsx")
print("Formatted Summary sheet")
Running scripts unattended
Once a script works by hand, the payoff is running it on a schedule. A few habits keep scheduled jobs reliable:
- Use absolute paths. A scheduler's working directory is rarely the script's folder, so
"orders.xlsx"may not resolve. Build paths from a known base instead:
from pathlib import Path
base_dir = Path.cwd() # in a real job, point this at your data folder
source = base_dir / "orders.xlsx"
print("Will read:", source)
- Fail loudly. Check that the source file exists before reading, so a missing input produces a clear message rather than a confusing traceback.
- Pin your dependencies. Record
pandasandopenpyxlversions in arequirements.txtso the job behaves the same after a redeploy.
When you are ready to put a script on a timer, Scheduling Python Excel Scripts with Cron walks through the setup.
Common first-time errors
ModuleNotFoundError: No module named 'openpyxl'— pandas found no Excel engine. Install one withpip install openpyxl.ValueError: Excel file format cannot be determined— the file is not really an.xlsx(often a renamed CSV or HTML export). Confirm the real format, or passengine="openpyxl"explicitly.PermissionError: [Errno 13] Permission deniedon write — the target file is open in Excel. Close it, or write to a new filename.- A leading unnamed column in your output — you forgot
index=Falseonto_excel.
Conclusion
The read - transform - write pattern is a one-time investment: once the sample script runs, you have a template for every report you will ever build. Add absolute paths, log row counts at ingestion and output, and pin your package versions — those three habits are the difference between a script that works once and a scheduled job you can trust.
Where to go next
You now have the full read - transform - write loop running. Build on it with these guides:
- Reading Excel Files with Pandas — sheet and column targeting, dtype control, and a defensive reader.
- Writing DataFrames to Excel with Pandas — multi-sheet exports, number formats, and appending to existing files.
- Using openpyxl for Excel File Manipulation — when you need cell-level control beyond what pandas offers.
- Working with Multiple Excel Sheets in Python — splitting and combining workbooks.