Guide

Advanced Data Transformation And CleaningDeep dive

Fill Missing Values in Excel with Pandas fillna

Q: Why does fillna() leave some blank cells untouched?

Those cells are empty strings, not NaN, so fillna() skips them. Run df.replace(r"^\s*$", pd.NA, regex=True) after loading, and avoid keep_default_na=False on import unless you map blanks yourself.

Q: How do I fill each column with a different value in one call?

Pass a dictionary, like df.fillna({"Revenue": df["Revenue"].median(), "Status": "Pending"}). This is the safest default because it never forces one column's fill onto another dtype.

Q: When should I use interpolate() instead of fillna()?

Use interpolate() for a continuous numeric trend, where it estimates a gap from its neighbors rather than repeating a single number. A scalar fillna or forward fill is better for categorical or non-trending data.

Q: Why did a numeric fill turn my column into text?

A column mixing numbers and text is object dtype, so filling with a number can coerce values to strings. Select numerics first with df.select_dtypes(include="number").columns and fill only those.

Q: Do my formulas survive to_excel()?

No — to_excel() writes computed values, not formulas. If downstream consumers need live formulas, write into a pre-built template with openpyxl instead of overwriting the sheet.

Fill missing values in an Excel file with pandas fillna: normalize blanks to NaN, then apply a scalar, a per-column dictionary, forward/back fill, or interpolation, and export.

To fill missing values in an Excel workbook with pandas, load the file with pd.read_excel(), turn Excel blanks into real NaN, apply DataFrame.fillna() (or ffill/bfill/interpolate), and write the result with df.to_excel(). This page walks that path end to end for a monthly report that mixes revenue numbers, a categorical status, and dates — three column types that each want a different fill. The examples below run in order and share a namespace; the first block writes a sample workbook so you can paste the whole page into one script.

This is the fill step of the wider handling missing data in Excel reports workflow: profiling and classifying the gaps comes first, choosing a fill comes here.

Prerequisites

Python 3.9+ with pandas and openpyxl installed: pip install pandas openpyxl. The openpyxl engine is what reads and writes the .xlsx format.
An .xlsx file with some blank cells. The first code block below writes one, so you can follow along without a file of your own.
Comfort loading a sheet into a DataFrame. If pd.read_excel() is new to you, start with how to read Excel with pandas step by step and come back.

Create a sample workbook

Python

import pandas as pd

seed = pd.DataFrame({
    "Region": ["North", "South", "West", "East"],
    "Revenue": [12000, None, 9800, None],
    "Status": ["Closed", None, "Open", None],
    "Date": ["2024-01-01", None, "2024-01-03", "2024-01-04"],
})
seed.to_excel("monthly_report.xlsx", index=False)

Load and normalize blanks

Excel frequently exports an empty cell as an empty string, which pandas keeps as valid text — fillna() skips it. Convert whitespace-only cells to NaN first so every gap is fillable:

Python

df = pd.read_excel("monthly_report.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)
print(df.isna().sum())

If your sheet uses text placeholders such as N/A, -, or null for missing data, catch them at read time with pd.read_excel(path, na_values=["N/A", "-", "null"]) so they arrive as NaN rather than needing a second pass.

Fill with a per-column dictionary

A dictionary fills each column with a type-appropriate value in one call. This is the safest default because it never forces one column's fill onto another:

Python

fill_map = {
    "Revenue": df["Revenue"].median(),
    "Status": "Pending",
}
df = df.fillna(fill_map)
print(df)

Fill strategies at a glance

Strategy	Syntax	Best for
Global scalar	`df.fillna(0)`	A single uniform default
Per-column dict	`df.fillna({"A": 0, "B": "n/a"})`	Mixed dtypes, categorical defaults
Forward / back fill	`df.ffill()` / `df.bfill()`	Ordered logs, time series
Interpolation	`df.interpolate()`	Continuous numeric trends

Forward and back fill for ordered data

For time series, carry the last known value forward, then backfill any leading gap. In pandas 3.0 the method= argument to fillna() was removed — use the dedicated ffill() and bfill() methods:

Python

df["Date"] = pd.to_datetime(df["Date"], errors="coerce")
df = df.sort_values("Date")
df["Date"] = df["Date"].ffill().bfill()
print(df[["Region", "Date"]])

Interpolate numeric trends

When a numeric column represents a smooth trend, interpolate() estimates gaps from neighboring values instead of repeating one number:

Python

series = pd.Series([10.0, None, None, 40.0])
print(series.interpolate(method="linear"))

Export the filled workbook

Write without the index so the output matches the input layout:

Python

df.to_excel("monthly_report_filled.xlsx", index=False, engine="openpyxl")
print("Wrote monthly_report_filled.xlsx")

The exported dates are real datetime values but will show as a raw serial or ISO string until you style them — see formatting dates in Excel cells with Python to apply a display format on the way out.

Common pitfalls and gotchas

A numeric fill turns the column into text. A column mixing numbers and text is object dtype; filling it with a number can coerce values to strings. Isolate numerics first:

Python

num_cols = df.select_dtypes(include="number").columns
df[num_cols] = df[num_cols].fillna(0)
print(df.dtypes)

fillna() leaves blanks untouched. They are empty strings, not NaN. Run df.replace(r"^\s*$", pd.NA, regex=True) after loading. Also avoid keep_default_na=False on import unless you map blanks yourself.

Formula cells lose their formulas on export. to_excel() writes computed values, not formulas. If downstream consumers need live formulas, write into a pre-built template with openpyxl instead of overwriting the sheet wholesale.

Filling instead of dropping. A row that is mostly blank is often better removed than fabricated — filling it invents data. When a whole record is empty, prefer removing blank rows from Excel with pandas over fillna().

New NaN after a join. An outer or left merge of two Excel files on a common column introduces NaN wherever a key had no match. Run the fill step after the merge, not before, or the fresh gaps slip through unfilled.

Validation checklist — confirm df.isna().sum() is zero where it must be, check df.dtypes to ensure numeric columns stayed numeric, and open the exported file to confirm no #VALUE! cells.

Performance and scale notes

Prefer one vectorized call over a per-column loop. A single df.fillna(fill_map) with a dictionary fills every column in one pass; a Python for loop that calls fillna column by column does the same work far more slowly on wide frames.
inplace=True no longer saves memory. With Copy-on-Write (the default from pandas 3.0), reassigning — df = df.fillna(...) — is the idiomatic form and does not duplicate data you have not modified. Do not reach for inplace= for speed.
Compute fill values once. df["Revenue"].median() scans the column; store it in a variable if you reference it repeatedly rather than recomputing inside a loop.
Downcast after filling if memory is tight. Filling an all-integer column with a mean produces floats. If you filled with a whole number, df["Revenue"] = df["Revenue"].astype("Int64") restores a compact nullable-integer dtype.
to_excel() is the slow step, not fillna(). Writing a large .xlsx through openpyxl dominates the runtime. For hundreds of thousands of rows, fill in pandas but consider writing to CSV, or chunk the export, rather than blaming the fill.

Fill within the group, not across the frame

A missing unit price filled from the whole column's median imports the average of an entire catalogue into one product; the same gap filled from that product's own rows is usually close to right. groupby(...).transform(lambda s: s.fillna(s.median())) is the pattern, and it is nearly always more defensible than a global fill. Where a group has no values at all to fill from, the honest outcome is that the gap stays and gets reported.

Fill, then verify

After any fill, re-run the missingness count. A column that still shows gaps means the fill value itself was missing — a group with no observations to take a median from — and that is worth reporting rather than filling twice with a broader default.

Log what the run actually did

Row counts at each boundary, what was filled, what was quarantined, how long it took: five or six lines per run turn a question about a number into a lookup. The value is not in reading them on a good day but in having them on a bad one, when a total has moved and nobody can say whether the source changed, the cleaning changed, or a filter was added. A job that records its own behaviour is one that can be debugged after the fact rather than re-run and watched.

Choose the fill source

Conclusion

Filling missing values well is less about fillna() and more about the two steps around it: normalizing Excel blanks to real NaN so nothing is silently skipped, and matching the fill to the column type — a per-column dictionary for mixed data, ffill/bfill for ordered logs, and interpolate() for continuous trends. Do those, validate with isna().sum() and dtypes, and the exported workbook is complete without inventing misleading numbers.

Frequently asked questions

Why does fillna() leave some blank cells untouched? Those cells are empty strings, not NaN, so fillna() skips them. Run df.replace(r"^\s*$", pd.NA, regex=True) after loading, and avoid keep_default_na=False on import unless you map blanks yourself.

How do I fill each column with a different value in one call? Pass a dictionary, like df.fillna({"Revenue": df["Revenue"].median(), "Status": "Pending"}). This is the safest default because it never forces one column's fill onto another dtype.

When should I use interpolate() instead of fillna()? Use interpolate() for a continuous numeric trend, where it estimates a gap from its neighbors rather than repeating a single number. A scalar fillna or forward fill is better for categorical or non-trending data.

Why did a numeric fill turn my column into text? A column mixing numbers and text is object dtype, so filling with a number can coerce values to strings. Select numerics first with df.select_dtypes(include="number").columns and fill only those.

Do my formulas survive to_excel()? No — to_excel() writes computed values, not formulas. If downstream consumers need live formulas, write into a pre-built template with openpyxl instead of overwriting the sheet.

Up to the parent workflow: Handling Missing Data in Excel Reports — profile, classify, and validate gaps end to end.
Remove Blank Rows from Excel with Pandas — when a record is better dropped than filled.
Merge Two Excel Files on a Common Column — a common source of new NaN to fill afterward.
Format Dates in Excel Cells with Python — display the dates you forward-filled.
Back to the toolkit: Advanced Data Transformation and Cleaning.