Guide

Getting Started With Python Excel AutomationDeep dive

Read Specific Columns From Excel With Pandas

Q: Can I use usecols with no header row?

Yes — pass header=None and select by index list or letter range. Name-based selection needs a header to match against.

Use read_excel usecols to load only the columns you need by name, index, Excel-letter range, or callable — combine it with sheet_name and dtype to save memory.

pandas.read_excel() loads every column on a sheet by default, which wastes memory and time when you only need four columns out of forty. The fix is the usecols parameter: it tells pandas to materialize a subset and skip the rest. This guide is part of Reading Excel Files with Pandas, and it covers all four forms usecols accepts — a list of names, a list of indices, an Excel-letter range, and a callable — plus how to combine the selection with sheet_name and dtype. Every snippet writes its own sample workbook so you can paste and run it.

Prerequisites

Install pandas and the .xlsx engine:

Bash

pip install pandas openpyxl

Create a sample workbook

Each example below reads this file. It has more columns than any one read needs, which is exactly when usecols earns its keep:

Python

import pandas as pd

sample = pd.DataFrame({
    "Order_ID": [1001, 1002, 1003],
    "SKU": ["A-100", "B-200", "C-300"],
    "Quantity": [3, 1, 5],
    "Unit_Price": [19.99, 49.50, 8.75],
    "Region": ["North", "South", "West"],
    "Notes": ["rush", "", "backorder"],
})
sample.to_excel("orders.xlsx", sheet_name="Raw", index=False)
print("wrote orders.xlsx")

Select columns by name

Pass a list of header names. The names must match the header row exactly — same casing, no leading or trailing spaces. Order in your list does not matter; pandas returns columns in their on-sheet order:

Python

import pandas as pd

df = pd.read_excel("orders.xlsx", usecols=["SKU", "Quantity", "Unit_Price"])
print(df.head())

Select columns by integer index

When headers are unreliable or absent, select by zero-based position instead. [0, 2, 3] reads the first, third, and fourth columns:

Python

import pandas as pd

df = pd.read_excel("orders.xlsx", usecols=[0, 2, 3])
print(df.columns.tolist())

Select columns by Excel letter range

usecols accepts a string of Excel column letters — the same A, B, C you see in the spreadsheet UI. Use a colon for a contiguous range, commas for individual columns, or mix both:

Python

import pandas as pd

contiguous = pd.read_excel("orders.xlsx", usecols="A:C")        # A, B, C
sparse = pd.read_excel("orders.xlsx", usecols="A,C,E")          # A, C, E
mixed = pd.read_excel("orders.xlsx", usecols="A:C,E")           # A, B, C, E
print(contiguous.columns.tolist())
print(sparse.columns.tolist())
print(mixed.columns.tolist())

The letters map to physical spreadsheet positions, so they ignore the header names entirely — handy when an export shuffles column titles but keeps the layout stable.

Select columns with a callable

Pass a function and pandas calls it once per column name, keeping the column when it returns True. This is the right tool for pattern-based selection — every column starting with a prefix, or every name except a blocklist:

Python

import pandas as pd

# Keep every column except free-text "Notes"
df = pd.read_excel("orders.xlsx", usecols=lambda name: name != "Notes")
print(df.columns.tolist())

# Keep only price-like columns
prices = pd.read_excel(
    "orders.xlsx",
    usecols=lambda name: "Price" in name or name == "Quantity",
)
print(prices.columns.tolist())

Combine usecols with sheet_name and dtype

usecols composes with the rest of read_excel. Target a tab with sheet_name — covered in depth in Working With Multiple Excel Sheets in Python — then pin the types of the columns you kept so downstream math doesn't break when Excel stores a number as text:

Python

import pandas as pd

df = pd.read_excel(
    "orders.xlsx",
    sheet_name="Raw",
    usecols=["Order_ID", "Quantity", "Unit_Price"],
    dtype={"Order_ID": "int64", "Quantity": "int64", "Unit_Price": "float64"},
)
print(df.dtypes)

The keys in dtype only need to cover columns that survive usecols; a dtype entry for a dropped column is silently ignored.

Why usecols saves memory on wide files

read_excel builds the full DataFrame in RAM. On a sheet with hundreds of columns, restricting to the handful you actually use cuts both the parse work and the resident memory roughly in proportion to the columns dropped. There is no chunksize for Excel, so column pruning is one of the few levers you have to keep a wide-file read inside a memory budget. Pair it with nrows when you only need a sample.

Common pitfalls

Symptom	Cause	Fix
`ValueError: Usecols do not match columns`	A name in the list isn't an exact header match (case or whitespace differs)	Print `pd.read_excel(path, nrows=0).columns.tolist()` and copy the exact spelling
Empty or wrong columns returned	Mixed names and indices in one list	Use either all names or all indices — never both in the same list
Selection is off by one column	Header sits below metadata rows, shifting positions	Apply `skiprows`/`header` first — see the step-by-step reading guide — so indices and letters align to the real header
`dtype` cast has no effect	The targeted column was dropped by `usecols`	Make sure the column is in your `usecols` list before pinning its type

A subtle one: a list mixing names and indices (["SKU", 2]) raises a ValueError. The string form ("A,C") and the callable are the only ways to express "these specific columns" without committing to a single key style.

Choosing the form of usecols

Python

import pandas as pd

by_name = pd.read_excel("orders.xlsx", usecols=["Order_ID", "Region", "Revenue"])
by_letters = pd.read_excel("orders.xlsx", usecols="A,C:E")
by_rule = pd.read_excel("orders.xlsx", usecols=lambda c: not str(c).lower().startswith("notes"))

print(by_name.shape, by_letters.shape, by_rule.shape)

Names are worth defaulting to because they fail loudly: a column that has been renamed upstream produces a ValueError naming it, while a letter range silently reads whatever now occupies those positions. When a required column might be missing, catching that error and reporting the available headers turns a stack trace into a message someone can act on.

The callable form is the one to reach for on very wide exports, where the rule is easier to state than the list — every column except the free-text ones, or only those matching a prefix. It is evaluated once per column name during parsing, so it costs nothing.

Narrow the read, then check it

usecols is the cheapest optimisation available and the easiest place for a silent failure, because a mistyped name raises while a wrong letter range does not. Printing the resulting columns once after the read — or asserting them against the list you expected — turns that into an immediate, legible error rather than a report built on the wrong three columns.

Combine usecols with dtype

The two arguments belong together: usecols decides which columns are parsed, and dtype decides how. Passing dtype=object alongside a narrow usecols gives a fast read with no inference at all, which is the combination that keeps identifiers intact and makes every later conversion explicit. It is also the pairing that makes a read reproducible — the same file produces the same frame regardless of what values happened to appear in it this month.

Fail where the cause is

The most useful place for a check is as close as possible to the thing that can go wrong: the sheet name at the read, the column list before the transform, the row count before the write, the file size before delivery. Each of those turns a confusing downstream error into a message naming the actual problem. Checks placed late still catch the failure, but they describe a symptom — and a symptom three stages from its cause is what makes a simple mistake take an afternoon.

Frequently asked questions

Does usecols order control the DataFrame column order? No. Whether you pass names, indices, or letters, pandas returns the selected columns in their original on-sheet order. Reorder afterward with df = df[["Unit_Price", "SKU"]].

Can I use usecols with no header row? Yes — pass header=None and select by index list or letter range. Name-based selection needs a header to match against.

What's the difference between "A:C" and [0, 1, 2]? They select the same physical columns. The letter range mirrors the spreadsheet UI and reads naturally; the index list is easier to build programmatically from a computed set of positions.

Why does my callable receive names I didn't expect? The callable receives the parsed header values, including any auto-generated Unnamed: N labels for blank header cells. Guard against those in your predicate.

Conclusion

usecols has four interchangeable forms — names, indices, an Excel-letter string, and a callable — and the right one depends on what's stable in your file. Names read clearest when headers are reliable; letters and indices survive header churn; the callable handles pattern-based selection. All four compose with sheet_name and dtype, so you can target a tab, prune columns, and pin types in a single call while keeping wide-file reads inside a memory budget.

Up one level: Reading Excel Files with Pandas — the end-to-end reading workflow this page fits into.
How to Read Excel Files With Pandas: A Step-by-Step Guide — see usecols alongside sheet targeting, header realignment, and dtypes.
Working With Multiple Excel Sheets in Python — pair column pruning with sheet_name to read exactly one tab.
Remove Blank Rows From Excel With Pandas — the next cleanup step once you have only the columns you need.
Write a Pandas DataFrame to Excel Without the Index — write the pruned DataFrame back out cleanly.