Guide

Advanced Data Transformation And CleaningDeep dive

Fill Missing Values in Excel with Pandas fillna

Fill missing values in an Excel file with pandas fillna: normalize blanks to NaN, then apply a scalar, a per-column dictionary, forward/back fill, or interpolation, and export.

To fill missing values in an Excel workbook with pandas, load the file with pd.read_excel(), turn Excel blanks into real NaN, apply DataFrame.fillna() (or ffill/bfill/interpolate), and write the result with df.to_excel(). The examples below run in order and share a namespace; the first block writes a sample workbook.

Create a sample workbook

Python
import pandas as pd

seed = pd.DataFrame({
    "Region": ["North", "South", "West", "East"],
    "Revenue": [12000, None, 9800, None],
    "Status": ["Closed", None, "Open", None],
    "Date": ["2024-01-01", None, "2024-01-03", "2024-01-04"],
})
seed.to_excel("monthly_report.xlsx", index=False)

Load and normalize blanks

Excel frequently exports an empty cell as an empty string, which pandas keeps as valid text — fillna() skips it. Convert whitespace-only cells to NaN first so every gap is fillable:

Python
df = pd.read_excel("monthly_report.xlsx", engine="openpyxl")
df = df.replace(r"^\s*$", pd.NA, regex=True)
print(df.isna().sum())

Fill with a per-column dictionary

A dictionary fills each column with a type-appropriate value in one call. This is the safest default because it never forces one column's fill onto another:

Python
fill_map = {
    "Revenue": df["Revenue"].median(),
    "Status": "Pending",
}
df = df.fillna(fill_map)
print(df)

Fill strategies at a glance

StrategySyntaxBest for
Global scalardf.fillna(0)A single uniform default
Per-column dictdf.fillna({"A": 0, "B": "n/a"})Mixed dtypes, categorical defaults
Forward / back filldf.ffill() / df.bfill()Ordered logs, time series
Interpolationdf.interpolate()Continuous numeric trends

Forward and back fill for ordered data

For time series, carry the last known value forward, then backfill any leading gap. In pandas 3.0 the method= argument to fillna() was removed — use the dedicated ffill() and bfill() methods:

Python
df["Date"] = pd.to_datetime(df["Date"], errors="coerce")
df = df.sort_values("Date")
df["Date"] = df["Date"].ffill().bfill()
print(df[["Region", "Date"]])

When a numeric column represents a smooth trend, interpolate() estimates gaps from neighboring values instead of repeating one number:

Python
series = pd.Series([10.0, None, None, 40.0])
print(series.interpolate(method="linear"))

Export the filled workbook

Write without the index so the output matches the input layout:

Python
df.to_excel("monthly_report_filled.xlsx", index=False, engine="openpyxl")
print("Wrote monthly_report_filled.xlsx")

Troubleshooting

A numeric fill turns the column into text. A column mixing numbers and text is object dtype; filling it with a number can coerce values to strings. Isolate numerics first:

Python
num_cols = df.select_dtypes(include="number").columns
df[num_cols] = df[num_cols].fillna(0)
print(df.dtypes)

fillna() leaves blanks untouched. They are empty strings, not NaN. Run df.replace(r"^\s*$", pd.NA, regex=True) after loading. Also avoid keep_default_na=False on import unless you map blanks yourself.

Formula cells lose their formulas on export. to_excel() writes computed values, not formulas. If downstream consumers need live formulas, write into a pre-built template with openpyxl instead of overwriting the sheet wholesale.

Validation checklist — confirm df.isna().sum() is zero where it must be, check df.dtypes to ensure numeric columns stayed numeric, and open the exported file to confirm no #VALUE! cells.

Where to go next