Guide

Getting Started With Python Excel AutomationDeep dive

Using openpyxl for Excel File Manipulation

A runnable, step-by-step guide to creating, reading, editing, styling, and saving .xlsx files with openpyxl — workbooks, sheets, cells, formatting, and images.

openpyxl reads and writes .xlsx and .xlsm files directly, without Excel installed. It works on the Office Open XML format, so the same script runs on Windows, macOS, and Linux. Where pandas gives you DataFrames, openpyxl gives you cell-level control: number formats, fonts, borders, column widths, formulas, and images. This guide builds that control up one capability at a time. Every snippet is runnable — the first step creates a sample workbook so you can paste each example in order and watch it work.

Step 1: Install openpyxl

Bash
pip install openpyxl

openpyxl supports .xlsx, .xlsm, and .xltx/.xltm templates. It does not read the legacy binary .xls format — convert those to .xlsx first, or read them with pandas plus xlrd==1.2.0. Embedding images (Step 8) additionally needs Pillow: pip install Pillow.

Step 2: Create a workbook from scratch

A new Workbook() starts with one empty sheet, reachable through wb.active. Rename it, add headers, and append rows. ws.append() takes a list and writes it to the first empty row, mapping items to columns A, B, C, and so on:

Python
from openpyxl import Workbook

wb = Workbook()
ws = wb.active
ws.title = "Sales"

# Header row, then data rows
ws.append(["Date", "Region", "Units", "Revenue"])
ws.append(["2024-01-05", "North", 12, 2399.88])
ws.append(["2024-01-06", "South", 5, 1247.50])
ws.append(["2024-01-07", "West", 8, 1599.20])

wb.save("report.xlsx")
print("Saved", ws.max_row, "rows across", ws.max_column, "columns")

Step 3: Load an existing workbook and navigate sheets

load_workbook() opens a file in read/write mode by default. Reach a sheet by exact name with wb["Sales"], list every tab with wb.sheetnames, or grab the active one with wb.active:

Python
from openpyxl import load_workbook

wb = load_workbook("report.xlsx")
print("Sheets:", wb.sheetnames)

ws = wb["Sales"]            # access by exact name
print("Active sheet:", wb.active.title)
print("Dimensions:", ws.dimensions)   # e.g. A1:D4

Sheet names are case-sensitive and keep trailing whitespace, so wb["sales"] or wb["Sales "] raises KeyError. Look the name up against wb.sheetnames if it comes from user input.

Step 4: Read cell values

Read a single cell by its coordinate (ws["A1"]) or by row/column number (ws.cell(row=1, column=1)) — both return a Cell, and .value holds its contents. To scan rows, iter_rows(values_only=True) yields plain tuples, which is the fastest way to pull data out:

Python
wb = load_workbook("report.xlsx")
ws = wb["Sales"]

print("A1 by coordinate:", ws["A1"].value)
print("A1 by index:     ", ws.cell(row=1, column=1).value)

# Iterate data rows (skip the header with min_row=2)
for date, region, units, revenue in ws.iter_rows(min_row=2, values_only=True):
    print(f"{date} | {region} | {units} units | ${revenue:,.2f}")

A robust pattern for templates with shifting columns is to map header names to column indices from row 1, then look up each value by name instead of hardcoding positions:

Python
header = next(ws.iter_rows(min_row=1, max_row=1, values_only=True))
col = {name: idx for idx, name in enumerate(header)}

total = 0.0
for row in ws.iter_rows(min_row=2, values_only=True):
    total += row[col["Revenue"]]
print(f"Total revenue: ${total:,.2f}")

Step 5: Write and edit cells

Assign to .value to set a cell, whether the cell exists or not. Writing past the current bounds extends the sheet automatically:

Python
wb = load_workbook("report.xlsx")
ws = wb["Sales"]

# Add a totals row below the data
last = ws.max_row + 1
ws.cell(row=last, column=1, value="Total")
ws.cell(row=last, column=3, value=f"=SUM(C2:C{last - 1})")
ws.cell(row=last, column=4, value=f"=SUM(D2:D{last - 1})")

wb.save("report.xlsx")
print("Wrote totals to row", last)

The formulas use last - 1 (not ws.max_row) because writing any cell in the totals row updates ws.max_row to last, so referencing ws.max_row inside the same block would create a formula that includes the totals cell itself — a circular reference.

openpyxl writes formulas as strings; it does not evaluate them. Excel calculates the result when the file is opened. To read a previously cached result instead of the formula text, load with data_only=True — but note that a file written by openpyxl and never opened in Excel has no cached value yet, so data_only=True returns None for those cells.

Step 6: Apply number formats and styles

Styling lives in the openpyxl.styles module. Set cell.number_format for display formatting (dates, currency, percentages) and assign Font, Alignment, Border, and PatternFill objects for appearance:

Python
from openpyxl import load_workbook
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side

wb = load_workbook("report.xlsx")
ws = wb["Sales"]

# Bold, white-on-blue header row
header_fill = PatternFill("solid", fgColor="305496")
for cell in ws[1]:
    cell.font = Font(bold=True, color="FFFFFF")
    cell.fill = header_fill
    cell.alignment = Alignment(horizontal="center")

# Currency format on the Revenue column (column D)
for cell in ws["D"][1:]:          # skip the header cell
    cell.number_format = "#,##0.00"

# Thin border around every used cell
thin = Side(style="thin")
box = Border(left=thin, right=thin, top=thin, bottom=thin)
for row in ws.iter_rows():
    for cell in row:
        cell.border = box

wb.save("report.xlsx")
print("Applied header, currency, and border styles")

Style objects are immutable and can be reused across many cells, which keeps memory low on large sheets. Set them once and assign the same object repeatedly.

Step 7: Set column widths and freeze the header

Column widths are stored on ws.column_dimensions[<letter>].width in character units. Auto-sizing means measuring the longest value in each column. Freezing panes keeps the header visible while scrolling:

Python
wb = load_workbook("report.xlsx")
ws = wb["Sales"]

# Auto-fit each column to its longest value
for column_cells in ws.columns:
    longest = max(len(str(cell.value)) for cell in column_cells if cell.value is not None)
    letter = column_cells[0].column_letter
    ws.column_dimensions[letter].width = longest + 2

ws.freeze_panes = "A2"   # rows above row 2 stay pinned
wb.save("report.xlsx")
print("Adjusted widths and froze the header row")

Step 8: Embed an image

Reports often need a logo or chart image. openpyxl.drawing.image.Image anchors a picture to a cell. This example generates a tiny PNG with the standard library so it runs without any external file:

Python
import struct, zlib
from openpyxl import load_workbook
from openpyxl.drawing.image import Image

# Write a minimal 1x1 PNG so the example is self-contained
def _png_chunk(tag, data):
    return (struct.pack(">I", len(data)) + tag + data
            + struct.pack(">I", zlib.crc32(tag + data) & 0xffffffff))

raw = b"\x00" + bytes((48, 84, 150))          # one blue pixel
png = (b"\x89PNG\r\n\x1a\n"
       + _png_chunk(b"IHDR", struct.pack(">IIBBBBB", 1, 1, 8, 2, 0, 0, 0))
       + _png_chunk(b"IDAT", zlib.compress(raw))
       + _png_chunk(b"IEND", b""))
with open("logo.png", "wb") as f:
    f.write(png)

wb = load_workbook("report.xlsx")
ws = wb["Sales"]
img = Image("logo.png")
img.width, img.height = 120, 40        # scale in pixels
ws.add_image(img, "F1")                # anchor to cell F1
wb.save("report.xlsx")
print("Embedded image anchored at F1")

Working with large workbooks

Standard mode loads the whole workbook into memory. For very large files, switch modes:

  • load_workbook("big.xlsx", read_only=True) streams rows instead of building the full object tree, dramatically cutting memory use for reads. Read-only worksheets do not support append() or cell writes.
  • Workbook(write_only=True) streams rows out as you append() them, ideal for generating large files. Write-only workbooks cannot be read back or styled cell-by-cell after the fact.

A common split is to generate bulk data in write-only mode, then reopen the file in standard mode to apply styling to the comparatively small header and summary regions.

Common errors and fixes

ErrorCauseFix
InvalidFileExceptionOpening a legacy .xls or a corrupted/renamed fileConvert .xls to .xlsx first; confirm the file is real OOXML, not a renamed CSV
KeyError: 'Sheet1'Sheet name case or trailing whitespace mismatchCompare against wb.sheetnames; strip the name before lookup
AttributeError: 'ReadOnlyWorksheet' object has no attribute 'append'Writing while opened with read_only=TrueReopen without read_only, or use write_only=True to generate
ValueError: Cannot convert ... to ExcelAssigning an unsupported type (custom object, set)Convert to a primitive first — str(value) or value.isoformat()
IllegalCharacterErrorControl characters in a cell valueStrip non-printable characters before assignment

Conclusion

openpyxl gives you direct access to the Office Open XML layer — you are editing the same XML structure Excel itself reads and writes. That means no Excel installation required, full cross-platform support, and the ability to script formatting tasks that would otherwise require manual spreadsheet work. The key operational rule: save with wb.save() after every logical group of changes; openpyxl does not commit changes until you save.

Where to go next

Once you can manipulate a workbook cell by cell, the natural next steps are appending to existing reports and reshaping data with pandas: