Guide

Advanced Data Transformation And CleaningDeep dive

Remove Blank Rows From Excel With Pandas

Q: What is the difference between how="all" and how="any"?

how="all" drops a row only when every cell is missing; how="any" (the default) drops a row when *any* cell is missing. For removing blank rows you almost always want how="all".

Q: Why do my whitespace-only rows survive dropna?

Because " " is a non-null string. pandas only treats true NaN/None as missing. Run df.replace(r"^\s*$", pd.NA, regex=True) first to convert blank strings to NaN.

Q: How do I drop rows that are missing only certain columns?

Use df.dropna(subset=["OrderID", "Customer"]). The row is dropped only if a value in one of the listed columns is missing.

Q: Why does my cleaned index have gaps?

dropna keeps original index labels. Call .reset_index(drop=True) to renumber from zero.

Strip blank rows from an Excel file with pandas: dropna(how='all'), subset and thresh, whitespace-only cells, index resets, and writing the cleaned file back.

To remove blank rows from an Excel file with pandas, load the workbook with pd.read_excel(), drop the empty rows with df.dropna(how="all"), reset the index, and write the result back with to_excel(). The tricky part is defining "blank": a fully empty row, a row missing only the columns you care about, and a row full of whitespace strings each need a different call. This page covers all three, then writes the cleaned workbook back to disk. It builds on the wider cleaning Excel data with pandas workflow.

Prerequisites

Bash

pip install pandas openpyxl

Every block below runs in order against a sample workbook built in the first step.

Create a messy sample workbook

Python

import pandas as pd

df = pd.DataFrame({
    "OrderID": ["A-100", None, "B-200", "  ", "C-300", None],
    "Customer": ["Acme", None, "Globex", None, "Initech", None],
    "Amount": [120.0, None, 80.0, None, None, None],
})
df.to_excel("orders_input.xlsx", index=False, engine="openpyxl")
print(f"Wrote {len(df)} rows (some blank)")

Rows 1 and 5 (zero-based) are fully empty. Row 3 has whitespace in OrderID but is otherwise empty. Row 4 has an OrderID but no Amount.

Drop fully empty rows with how="all"

how="all" removes only rows where every cell is NaN. This is almost always what you want for blank rows — the default how="any" would delete any row with a single missing cell, which is far too aggressive.

Python

df = pd.read_excel("orders_input.xlsx", engine="openpyxl")

cleaned = df.dropna(how="all")
print(cleaned)

That drops the two fully empty rows but keeps the whitespace row and the row missing Amount, because neither is entirely NaN.

Convert whitespace-only cells to NaN first

pd.read_excel reads " " as a literal string, not NaN, so a visually blank row survives dropna. Replace whitespace-only strings with pd.NA before dropping:

Python

df = pd.read_excel("orders_input.xlsx", engine="openpyxl")

df = df.replace(r"^\s*$", pd.NA, regex=True)
cleaned = df.dropna(how="all")
print(cleaned)

Now the whitespace row collapses to all-NaN and gets removed. Run this normalization step first whenever data comes from manual entry or a CSV-to-Excel round trip.

Drop rows missing a key field with subset

To delete rows that lack a specific required column — say every row without an OrderID — pass subset:

Python

df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)

cleaned = df.dropna(subset=["OrderID"])
print(cleaned)

This keeps the row missing only Amount (it still has an OrderID) while removing every row with no identifier. Combine subset with how="all" by chaining calls when you need both rules. Rows you deliberately keep may still have gaps worth addressing — see handling missing data in Excel reports for filling, flagging, or imputing those values instead of dropping them.

Keep rows with at least N real values using thresh

thresh=N keeps rows that have at least N non-null values. Use it when a row is only useful if most of its fields are populated:

Python

df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)

# Keep rows with 2 or more populated cells
cleaned = df.dropna(thresh=2)
print(cleaned)

thresh counts non-null cells, so it overrides how if both are passed — pick one.

Reset the index after dropping

dropna preserves the original index, leaving gaps like 0, 2, 4. Those gaps break positional logic and export an odd-looking index. Reset before writing:

Python

df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)

cleaned = df.dropna(how="all").reset_index(drop=True)
print(cleaned.index.tolist())

drop=True discards the old index instead of pushing it into a new column.

Write the cleaned file back

Python

df = pd.read_excel("orders_input.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)

cleaned = (df.dropna(how="all")
             .dropna(subset=["OrderID"])
             .reset_index(drop=True))

cleaned.to_excel("orders_cleaned.xlsx", index=False, engine="openpyxl")
print(f"Wrote {len(cleaned)} rows to orders_cleaned.xlsx")

Pass index=False so the reset index does not become a stray first column in the output. Removing blank rows often surfaces duplicate records that empty rows had previously separated; to collapse those next, follow up with dropping duplicates from an Excel column.

Common pitfalls

Symptom	Cause	Fix
Real data rows disappear	Default `how="any"` drops any row with one missing cell	Use `how="all"` or `subset=[...]`
Visually blank rows survive	`" "` is a string, not `NaN`	`df.replace(r"^\s*$", pd.NA, regex=True)` first
Index reads `0, 3, 7` after drop	`dropna` keeps the original index	`.reset_index(drop=True)`
Extra unnamed column in output	Reset index written to file	`to_excel(..., index=False)`
First data rows are blank/garbled	A multi-row header was read as data	`pd.read_excel(..., header=[0, 1])` or `skiprows=N`

The multi-row-header case is common with exported reports: a banner or merged title row above the real header makes pandas read junk rows. Use skiprows to skip the banner, or header=[0, 1] for a genuine two-level header, rather than dropping the rows afterward.

Performance and scale note

dropna and replace run vectorized in C, so even files with hundreds of thousands of rows clean in well under a second. The regex replace is the slower of the two; if you only need to strip whitespace from one or two known string columns, target them directly with df[col] = df[col].str.strip().replace("", pd.NA) instead of scanning the whole frame. For very large workbooks, read only the columns you need with usecols= to cut memory before cleaning.

Blank rows in the middle versus at the end

Python

import pandas as pd

df = pd.read_excel("messy.xlsx", sheet_name="Sheet1", dtype=object)

blank = df.isna().all(axis=1)
first_blank_block = blank.idxmax() if blank.any() else None
print(f"{blank.sum()} blank row(s); first at index {first_blank_block}")

trailing = df.loc[: blank[::-1].idxmin()]        # everything before the trailing run
cleaned = trailing[~trailing.isna().all(axis=1)].reset_index(drop=True)
print(len(df), "->", len(cleaned))

Distinguishing the two cases before dropping is the point. A file whose blanks are all at the bottom is simply carrying leftover formatting, and dropna(how="all") is entirely safe. A file with a gap in the middle usually has two tables on one sheet, and flattening them into one produces a frame where the second table's header sits in the data — a failure that looks like corrupt data rather than like a layout the reader intended.

Resetting the index after dropping keeps later row-number arithmetic honest: without it, converting a pandas index into a spreadsheet row number stops matching what the reader sees.

Blank rows are a symptom

A file that regularly arrives with blank rows is telling you something about how it is produced — an export that pads its output, a template with formatting below the data, or two tables pasted onto one sheet. Cleaning them is right; asking why they appear is often better, because the fix at the source removes the rule permanently and helps everyone else consuming the same export.

Count before and after

Logging the row count on either side of a blank-row drop takes one line and answers the question that a changed total always raises. It also makes an unexpected result visible immediately: a file where the drop removes four hundred rows rather than four is telling you something about the export that is worth investigating before the report is published.

Presentation comes after the data

Any write replaces what it covers, so formatting, filters, images and charts belong in a single finishing pass that runs after the last value has been written. Splitting the job that way — build the frame, write it, then decorate the finished sheet — is what stops a style disappearing the month someone adds a to_excel call in the middle. It also gives a report one obvious place to change when the house style moves, instead of a dozen scattered blocks that have to be found first.

Blank rows and the used range

Excel's idea of a used range is generous: a row that once held a border or a fill counts as used long after its values were deleted, which is why ws.max_row on a hand-edited sheet often points hundreds of rows below the last real record. pandas inherits that generosity, reading those rows as a block of NaN at the bottom of the frame.

That explains both symptoms people meet here — a dropna that removes far more rows than the sheet appeared to contain, and a row count that never matches what a reader sees. Deriving the last row from a key column rather than from the sheet's declared dimensions resolves both, and it is the same habit that keeps filters, tables and total rows pointing at the right range.

Which rows to drop

Frequently asked questions

What is the difference between how="all" and how="any"? how="all" drops a row only when every cell is missing; how="any" (the default) drops a row when any cell is missing. For removing blank rows you almost always want how="all".

Why do my whitespace-only rows survive dropna? Because " " is a non-null string. pandas only treats true NaN/None as missing. Run df.replace(r"^\s*$", pd.NA, regex=True) first to convert blank strings to NaN.

How do I drop rows that are missing only certain columns? Use df.dropna(subset=["OrderID", "Customer"]). The row is dropped only if a value in one of the listed columns is missing.

Does dropna modify the DataFrame in place? No, it returns a new DataFrame by default. Reassign the result (df = df.dropna(...)) or pass inplace=True.

Why does my cleaned index have gaps? dropna keeps original index labels. Call .reset_index(drop=True) to renumber from zero.

Conclusion

Removing blank rows reliably is three steps: normalize whitespace to NaN, drop with the right how/subset/thresh rule, then reset the index before writing. Skipping the normalization step is the single most common reason "empty" rows survive.

Cleaning Excel Data with Pandas — the full data-cleaning workflow this page belongs to.
Pandas: Drop Duplicates From an Excel Column — remove repeated rows after dropping blanks.
Handling Missing Data in Excel Reports — fill, flag, or impute the gaps you keep.
Advanced Data Transformation and Cleaning — the wider reshaping and cleaning overview.