Guide
Remove Blank Rows From Excel With Pandas
Strip blank rows from an Excel file with pandas: dropna(how='all'), subset and thresh, whitespace-only cells, index resets, and writing the cleaned file back.
To remove blank rows from an Excel file with pandas, load the workbook with pd.read_excel(), drop the empty rows with df.dropna(how="all"), reset the index, and write the result back with to_excel(). The tricky part is defining "blank": a fully empty row, a row missing only the columns you care about, and a row full of whitespace strings each need a different call. This page covers all three. It is part of the Cleaning Excel Data with Pandas cluster.
Prerequisites
pip install pandas openpyxl
Every block below runs in order against a sample workbook built in the first step.
Create a messy sample workbook
import pandas as pd
df = pd.DataFrame({
"OrderID": ["A-100", None, "B-200", " ", "C-300", None],
"Customer": ["Acme", None, "Globex", None, "Initech", None],
"Amount": [120.0, None, 80.0, None, None, None],
})
df.to_excel("orders_input.xlsx", index=False, engine="openpyxl")
print(f"Wrote {len(df)} rows (some blank)")
Rows 1 and 5 (zero-based) are fully empty. Row 3 has whitespace in OrderID but is otherwise empty. Row 4 has an OrderID but no Amount.
Drop fully empty rows with how="all"
how="all" removes only rows where every cell is NaN. This is almost always what you want for blank rows — the default how="any" would delete any row with a single missing cell, which is far too aggressive.
df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
cleaned = df.dropna(how="all")
print(cleaned)
That drops the two fully empty rows but keeps the whitespace row and the row missing Amount, because neither is entirely NaN.
Convert whitespace-only cells to NaN first
pd.read_excel reads " " as a literal string, not NaN, so a visually blank row survives dropna. Replace whitespace-only strings with pd.NA before dropping:
df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)
cleaned = df.dropna(how="all")
print(cleaned)
Now the whitespace row collapses to all-NaN and gets removed. Run this normalization step first whenever data comes from manual entry or a CSV-to-Excel round trip.
Drop rows missing a key field with subset
To delete rows that lack a specific required column — say every row without an OrderID — pass subset:
df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)
cleaned = df.dropna(subset=["OrderID"])
print(cleaned)
This keeps the row missing only Amount (it still has an OrderID) while removing every row with no identifier. Combine subset with how="all" by chaining calls when you need both rules.
Keep rows with at least N real values using thresh
thresh=N keeps rows that have at least N non-null values. Use it when a row is only useful if most of its fields are populated:
df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)
# Keep rows with 2 or more populated cells
cleaned = df.dropna(thresh=2)
print(cleaned)
thresh counts non-null cells, so it overrides how if both are passed — pick one.
Reset the index after dropping
dropna preserves the original index, leaving gaps like 0, 2, 4. Those gaps break positional logic and export an odd-looking index. Reset before writing:
df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)
cleaned = df.dropna(how="all").reset_index(drop=True)
print(cleaned.index.tolist())
drop=True discards the old index instead of pushing it into a new column.
Write the cleaned file back
df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)
cleaned = (df.dropna(how="all")
.dropna(subset=["OrderID"])
.reset_index(drop=True))
cleaned.to_excel("orders_cleaned.xlsx", index=False, engine="openpyxl")
print(f"Wrote {len(cleaned)} rows to orders_cleaned.xlsx")
Pass index=False so the reset index does not become a stray first column in the output.
Common pitfalls
| Symptom | Cause | Fix |
|---|---|---|
| Real data rows disappear | Default how="any" drops any row with one missing cell | Use how="all" or subset=[...] |
| Visually blank rows survive | " " is a string, not NaN | df.replace(r"^\s*$", pd.NA, regex=True) first |
Index reads 0, 3, 7 after drop | dropna keeps the original index | .reset_index(drop=True) |
| Extra unnamed column in output | Reset index written to file | to_excel(..., index=False) |
| First data rows are blank/garbled | A multi-row header was read as data | pd.read_excel(..., header=[0, 1]) or skiprows=N |
The multi-row-header case is common with exported reports: a banner or merged title row above the real header makes pandas read junk rows. Use skiprows to skip the banner, or header=[0, 1] for a genuine two-level header, rather than dropping the rows afterward.
Performance and scale note
dropna and replace run vectorized in C, so even files with hundreds of thousands of rows clean in well under a second. The regex replace is the slower of the two; if you only need to strip whitespace from one or two known string columns, target them directly with df[col] = df[col].str.strip().replace("", pd.NA) instead of scanning the whole frame. For very large workbooks, read only the columns you need with usecols= to cut memory before cleaning.
Frequently asked questions
What is the difference between how="all" and how="any"? how="all" drops a row only when every cell is missing; how="any" (the default) drops a row when any cell is missing. For removing blank rows you almost always want how="all".
Why do my whitespace-only rows survive dropna? Because " " is a non-null string. pandas only treats true NaN/None as missing. Run df.replace(r"^\s*$", pd.NA, regex=True) first to convert blank strings to NaN.
How do I drop rows that are missing only certain columns? Use df.dropna(subset=["OrderID", "Customer"]). The row is dropped only if a value in one of the listed columns is missing.
Does dropna modify the DataFrame in place? No, it returns a new DataFrame by default. Reassign the result (df = df.dropna(...)) or pass inplace=True.
Why does my cleaned index have gaps? dropna keeps original index labels. Call .reset_index(drop=True) to renumber from zero.
Conclusion
Removing blank rows reliably is three steps: normalize whitespace to NaN, drop with the right how/subset/thresh rule, then reset the index before writing. Skipping the normalization step is the single most common reason "empty" rows survive.
Where to go next
- Cleaning Excel Data with Pandas — the parent cluster.
- Pandas: Drop Duplicates From an Excel Column — remove repeated rows after dropping blanks.
- Handling Missing Data in Excel Reports — fill, flag, or impute the gaps you keep.