Guide
Reading Excel Files with Pandas
Load Excel workbooks into pandas DataFrames: pick the engine, target sheets and columns, control dtypes, and read multi-sheet files — with runnable examples.
pandas.read_excel() turns a worksheet into a DataFrame, ready for filtering, aggregation, and export. This page is the map of that function: how the engine works, the handful of parameters you will use constantly, and how to handle multi-sheet workbooks. Every snippet runs in order against a sample file created below, so you can paste and follow along. For a slower, fully narrated walkthrough, see How to Read Excel with Pandas Step by Step.
Install pandas and an engine
pandas does not parse Excel itself — it hands the file to an engine. openpyxl reads modern .xlsx files and is the one you need here. (Legacy .xls requires xlrd==1.2.0; xlrd 2.0 dropped .xls support. Binary .xlsb needs pyxlsb or calamine.)
pip install pandas openpyxl
Create a sample workbook
So the examples below have something to read, write a two-sheet workbook now:
import pandas as pd
sales = pd.DataFrame({
"Date": ["2024-01-05", "2024-01-06", "2024-01-07", "2024-01-08"],
"Transaction_ID": ["T-001", "T-002", "T-003", "T-004"],
"Amount": [120.50, 89.00, 240.75, 15.25],
"Category": ["Hardware", "Software", "Hardware", "Services"],
})
returns = pd.DataFrame({
"Date": ["2024-01-09", "2024-01-10"],
"Transaction_ID": ["T-005", "T-006"],
"Amount": [-30.00, -12.50],
"Category": ["Hardware", "Services"],
})
with pd.ExcelWriter("ledger.xlsx", engine="openpyxl") as writer:
sales.to_excel(writer, sheet_name="Sales", index=False)
returns.to_excel(writer, sheet_name="Returns", index=False)
print("Wrote ledger.xlsx")
The simplest read
Call read_excel with just a path. It reads the first sheet and uses row 0 as the header:
df = pd.read_excel("ledger.xlsx")
print(df)
Parameters you will use constantly
A handful of arguments cover most real reads. Here they are together on the Sales sheet:
df = pd.read_excel(
"ledger.xlsx",
sheet_name="Sales", # name or 0-based index
usecols=["Date", "Amount", "Category"], # read only these columns
parse_dates=["Date"], # convert to datetime64
dtype={"Category": "category"}, # pin types you depend on
)
print(df.dtypes)
print(df)
sheet_name— a name ("Sales"), a 0-based index (0), a list to read several sheets at once, orNonefor every sheet.usecols— limit to the columns you need; accepts a label list or an Excel range like"A:C". Smaller reads use less memory.parse_dates— turn date columns into realdatetime64values so you can resample and compare periods.dtype— pin types explicitly. Use"string"for IDs to preserve leading zeros and avoid scientific notation;"category"for low-cardinality text.skiprows/header— drop title or metadata rows above the real header. The step-by-step guide covers these in depth.
Reading multi-sheet workbooks
Reporting files rarely live on a single tab. You have three ways to navigate them.
Inspect the tabs first
Open the workbook once with pd.ExcelFile to list sheets without loading data, then read only what you need. Reusing the same handle avoids re-parsing the file:
xls = pd.ExcelFile("ledger.xlsx", engine="openpyxl")
print("Sheets:", xls.sheet_names)
df_returns = pd.read_excel(xls, sheet_name="Returns")
print(df_returns)
Read several sheets at once
Pass a list of names (or None for all sheets) and pandas returns a dict of {sheet_name: DataFrame}:
frames = pd.read_excel("ledger.xlsx", sheet_name=["Sales", "Returns"])
print(type(frames), list(frames))
Stack the sheets into one DataFrame
A common goal is one combined table. Concatenate the dict's values, tagging each row with its source sheet:
combined = pd.concat(
[frame.assign(Source=name) for name, frame in frames.items()],
ignore_index=True,
)
print(combined)
For combining data spread across separate files rather than tabs, see Working with Multiple Excel Sheets in Python and Combine Multiple Excel Files into One.
Verify what you loaded
Right after a read, check the shape, columns, and types. This catches template drift — a renamed column or a number that arrived as text — before it corrupts a downstream calculation:
print("Rows, cols:", combined.shape)
print(combined.dtypes)
print(combined.isna().sum())
Common errors and fixes
| Error | Cause | Fix |
|---|---|---|
ModuleNotFoundError: No module named 'openpyxl' | No Excel engine installed | pip install openpyxl |
ValueError: Excel file format cannot be determined | Wrong extension, or a renamed CSV/HTML file | Confirm the real format; pass engine="openpyxl" |
ValueError: Worksheet named 'X' not found | Sheet name typo or template change | List pd.ExcelFile(path).sheet_names first |
IDs lose leading zeros / show as 1.0e+05 | pandas inferred a numeric dtype | Read those columns as dtype={"col": "string"} |
PermissionError on read | File is open in Excel | Close it, or copy the file before reading |
Conclusion
pd.read_excel() is three lines at its simplest; the complexity comes from defensive practice — naming the sheet, naming the columns, pinning the types, and verifying the shape right after the read. Build those four habits into every read and schema drift, missing engines, and silent numeric coercions stop being surprises.
Where to go next
- How to Read Excel with Pandas Step by Step — the fully narrated version, including a defensive reader for scheduled jobs.
- Writing DataFrames to Excel with Pandas — the other half of the loop, once your data is loaded and transformed.
- Cleaning Excel Data with Pandas — fix the messy values a read inevitably surfaces.