Guide

Automating Reporting WorkflowsDeep dive

Exporting Excel Reports to PDF

Q: Can openpyxl or pandas export a PDF directly?

No. openpyxl writes .xlsx files and pandas handles tabular data; neither renders a workbook to PDF. You need LibreOffice, Excel, or a PDF library like reportlab. Treat any claim otherwise as a red flag.

Q: My PDF splits columns across two pages — how do I stop it?

Set the page layout in the workbook before converting: ws.page_setup.orientation = "landscape", ws.page_setup.fitToWidth = 1, and ws.sheet_properties.pageSetUpPr = PageSetupProperties(fitToPage=True). fitToWidth is ignored unless fitToPage is also enabled.

Convert .xlsx reports to PDF from Python the honest way: LibreOffice headless, Excel COM, or reportlab — plus openpyxl page setup so the output paginates cleanly.

PDF is the format people actually open. It renders identically on every machine, it cannot be accidentally edited, and it prints predictably — which is why the final step of most reporting jobs is converting the workbook to PDF before it is emailed or filed. The catch is that no pure-pip Python library renders an .xlsx to PDF faithfully. openpyxl writes spreadsheets but cannot export them; pandas reads and writes data, not page layout. Anyone who tells you df.to_pdf() exists is wrong. This page lays out the three approaches that genuinely work, when to reach for each, and the openpyxl page-setup step that makes the difference between a clean one-page report and a mess that spills columns across six sheets. It is the export stage of the broader Automating Reporting Workflows pipeline.

The three real options

There is no single best method — the right one depends on your platform and how much fidelity you need.

Method	Platform	Fidelity	Dependencies	Best for
LibreOffice headless (`soffice --convert-to pdf`)	Linux / macOS / Windows	High — renders the real workbook, styles, charts	LibreOffice installed (not pip)	Servers, cron jobs, CI; the portable default
Excel COM via xlwings / pywin32	Windows + Excel only	Pixel-perfect — Excel renders it	Microsoft Excel + `pip install xlwings`	Desktops where Excel is already licensed
reportlab (build the PDF directly)	Any	Total control, but you rebuild the layout	`pip install reportlab`	When you control the design and skip Excel rendering

The first two render an existing workbook. The third skips the workbook entirely and draws the PDF from your data — more work, but no Excel or LibreOffice required. For the step-by-step portable recipe, see Convert an Excel File to PDF with Python.

Set up the page before you convert

The single biggest source of ugly PDFs is converting a workbook that was never told how to print. A wide sheet paginates into a left half and a right half on separate pages; a tall sheet breaks columns awkwardly. Fix this in the workbook with openpyxl's page-setup attributes before you hand the file to any converter — every method above respects them. Apply the same settings to each tab when you export a multi-sheet dashboard, and set them as part of the styling pass if you are already formatting and charting the report upstream.

Python

from openpyxl import Workbook
from openpyxl.worksheet.properties import PageSetupProperties

wb = Workbook()
ws = wb.active
ws.title = "Summary"

# Some sample content wide enough to need fitting.
ws.append(["Region", "Q1", "Q2", "Q3", "Q4", "FY Total", "YoY %", "Notes"])
for r in range(1, 31):
    ws.append([f"Branch {r}", 100 + r, 120 + r, 95 + r, 130 + r,
               445 + 4 * r, 3.2, "on track"])

# Landscape so wide tables get more horizontal room.
ws.page_setup.orientation = "landscape"

# Fit ALL columns onto one page width; let height flow over pages.
ws.page_setup.fitToWidth = 1
ws.page_setup.fitToHeight = 0           # 0 = unlimited pages tall
ws.sheet_properties.pageSetUpPr = PageSetupProperties(fitToPage=True)

# Restrict the export to the data region (skip stray cells far to the right).
ws.print_area = "A1:H31"

# Repeat the header row on every printed page.
ws.print_title_rows = "1:1"

wb.save("report.xlsx")
print("Saved report.xlsx with print layout configured")

The crucial pairing is fitToWidth = 1 and pageSetUpPr = PageSetupProperties(fitToPage=True). Setting fitToWidth alone does nothing — Excel and LibreOffice ignore it unless the fitToPage flag on the sheet properties is also on. With both set, the converter scales the columns down to one page wide while letting rows flow over as many pages as needed.

Method 1 — LibreOffice headless (the portable default)

LibreOffice ships a headless mode that opens the workbook, renders it exactly as the desktop app would, and writes a PDF — with no GUI. It runs on Linux, macOS, and Windows, costs nothing, and handles styles and charts well, which makes it the right default for any unattended job. Drive it from Python with subprocess. The binary is soffice (or libreoffice on some Linux distros).

Python

import shutil
import subprocess
from pathlib import Path

def xlsx_to_pdf_libreoffice(xlsx_path, out_dir=None, timeout=120):
    """Convert an .xlsx to PDF via headless LibreOffice.

    Requires LibreOffice installed and `soffice` reachable on PATH.
    Returns the path to the generated PDF.
    """
    src = Path(xlsx_path).resolve()
    if not src.is_file():
        raise FileNotFoundError(src)

    soffice = shutil.which("soffice") or shutil.which("libreoffice")
    if soffice is None:
        raise RuntimeError("LibreOffice not found; install it and ensure "
                           "'soffice' is on PATH")

    out_dir = Path(out_dir or src.parent).resolve()
    out_dir.mkdir(parents=True, exist_ok=True)

    result = subprocess.run(
        [soffice, "--headless", "--convert-to", "pdf",
         "--outdir", str(out_dir), str(src)],
        capture_output=True, text=True, timeout=timeout,
    )
    if result.returncode != 0:
        raise RuntimeError(
            f"LibreOffice failed (exit {result.returncode}): {result.stderr}")

    pdf_path = out_dir / (src.stem + ".pdf")
    if not pdf_path.is_file():
        raise RuntimeError(f"Conversion reported success but {pdf_path} "
                           f"is missing. stdout: {result.stdout}")
    return pdf_path

# pdf = xlsx_to_pdf_libreoffice("report.xlsx")
# print("Wrote", pdf)

Always check returncode and that the expected output file exists — LibreOffice occasionally exits zero while writing nothing if the input is corrupt. The timeout guards against a hung process in a scheduled job; if it fires, subprocess.run raises TimeoutExpired, which your job wrapper should catch and log.

Method 2 — Excel COM on Windows

On a Windows box with Excel installed, you can ask Excel itself to export — the result is pixel-perfect because it is Excel's own renderer. xlwings wraps the COM API cleanly:

Python

# Windows + Microsoft Excel only.  pip install xlwings
import xlwings as xw

def xlsx_to_pdf_excel(xlsx_path, pdf_path):
    """Pixel-perfect export using Excel's own engine. Windows + Excel only."""
    app = xw.App(visible=False)
    try:
        wb = app.books.open(xlsx_path)
        wb.to_pdf(pdf_path)        # xlwings >= 0.21 convenience wrapper
        wb.close()
    finally:
        app.quit()

# xlsx_to_pdf_excel("report.xlsx", "report.pdf")

Under the hood to_pdf() calls ExportAsFixedFormat on the COM workbook. This is the highest-fidelity option, but it only runs where Excel is licensed and installed — it is a non-starter on a Linux server, so do not build a server pipeline around it.

Method 3 — Build the PDF directly with reportlab

When you control the layout and do not need the workbook's exact styling, skip Excel rendering entirely and draw the PDF from your data with reportlab. This needs no Excel and no LibreOffice — just a pip install — which is appealing for locked-down environments.

Python

# pip install reportlab openpyxl
from openpyxl import load_workbook
from reportlab.lib.pagesizes import landscape, A4
from reportlab.lib import colors
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle

def sheet_to_pdf_reportlab(xlsx_path, pdf_path, sheet=None):
    """Read a sheet's cells and lay them out as a PDF table. No Excel needed."""
    wb = load_workbook(xlsx_path, data_only=True)
    ws = wb[sheet] if sheet else wb.active
    rows = [[("" if c is None else c) for c in row]
            for row in ws.iter_rows(values_only=True)]

    doc = SimpleDocTemplate(pdf_path, pagesize=landscape(A4))
    table = Table(rows, repeatRows=1)
    table.setStyle(TableStyle([
        ("BACKGROUND", (0, 0), (-1, 0), colors.HexColor("#1f4e78")),
        ("TEXTCOLOR", (0, 0), (-1, 0), colors.white),
        ("FONTSIZE", (0, 0), (-1, -1), 8),
        ("GRID", (0, 0), (-1, -1), 0.25, colors.grey),
    ]))
    doc.build([table])

# sheet_to_pdf_reportlab("report.xlsx", "report.pdf")

You trade fidelity for control and zero external dependencies. Charts, merged-cell layouts, and conditional formatting do not come across — you rebuild whatever you want on the page yourself. For data-only summaries this is fast and fully self-contained.

Automating the conversion in a scheduled job

In production the conversion is one step in an unattended pipeline: generate the workbook — often from a saved template — convert it, then email it. For LibreOffice headless this means two precautions. First, give each run its own user-profile directory so a desktop LibreOffice instance (or a previous run) cannot lock the conversion. Second, set a timeout and treat both a non-zero exit and a missing output file as failures your scheduler will alert on.

Python

import tempfile, shutil, subprocess
from pathlib import Path

def convert_isolated(xlsx_path, out_dir, timeout=120):
    """LibreOffice conversion with a throwaway profile dir — safe under cron."""
    soffice = shutil.which("soffice") or shutil.which("libreoffice")
    if soffice is None:
        raise RuntimeError("install LibreOffice; 'soffice' not on PATH")
    src = Path(xlsx_path).resolve()
    out_dir = Path(out_dir).resolve()
    out_dir.mkdir(parents=True, exist_ok=True)

    with tempfile.TemporaryDirectory() as profile:
        result = subprocess.run(
            [soffice, f"-env:UserInstallation=file://{profile}",
             "--headless", "--convert-to", "pdf",
             "--outdir", str(out_dir), str(src)],
            capture_output=True, text=True, timeout=timeout,
        )
    if result.returncode != 0:
        raise RuntimeError(f"convert failed: {result.stderr}")
    pdf = out_dir / (src.stem + ".pdf")
    if not pdf.is_file():
        raise RuntimeError("no PDF produced")
    return pdf

The -env:UserInstallation flag is the key line — it points LibreOffice at a fresh profile per invocation, which is what makes headless conversion reliable when another instance might be running. On a headless server, also install the font packages your reports use, or text falls back to substitutes and the PDF looks wrong.

Choosing a conversion route

Three routes exist, and they differ in fidelity, portability and what has to be installed:

Python

import subprocess
from pathlib import Path

def to_pdf(xlsx_path, outdir="pdf", timeout=180):
    Path(outdir).mkdir(parents=True, exist_ok=True)
    subprocess.run(
        ["soffice", "--headless", "--convert-to", "pdf", "--outdir", outdir, str(xlsx_path)],
        check=True, timeout=timeout,
    )
    result = Path(outdir) / (Path(xlsx_path).stem + ".pdf")
    if not result.exists() or result.stat().st_size == 0:
        raise RuntimeError(f"conversion produced nothing for {xlsx_path}")
    return result

The size check is not paranoia. LibreOffice exits zero in several situations where it produced no file at all — most commonly when a desktop session is already using the same user profile — so trusting the return code alone gives you a job that reports success and delivers nothing.

Page setup decides what the PDF looks like

A conversion reproduces the workbook's print settings, so the fix for a PDF that spills across nine pages is in the spreadsheet rather than in the converter:

Python

from openpyxl import load_workbook

wb = load_workbook("report.xlsx")
for ws in wb.worksheets:
    ws.page_setup.orientation = "landscape"
    ws.page_setup.fitToWidth = 1
    ws.page_setup.fitToHeight = 0                  # any number of pages down
    ws.sheet_properties.pageSetUpPr.fitToPage = True
    ws.print_title_rows = "1:1"                    # repeat the header on each page
    ws.print_area = f"A1:F{ws.max_row}"            # exclude working columns
wb.save("report.xlsx")

fitToHeight = 0 is the setting people miss: fitting to one page in both directions shrinks a long report to unreadable text, while fitting the width alone keeps the type legible and lets the document run to as many pages as it needs.

Setting a print area is the other high-value line. Working columns, notes and scratch calculations are invisible to a spreadsheet reader who never scrolls right, and unmistakable in a PDF.

Verify the PDF before it is delivered

A conversion can exit cleanly and produce a document nobody should send: an empty page, a report truncated at the print area, or a file that is a fraction of its usual size. Three checks catch all three:

Python

from pathlib import Path

def check_pdf(path, min_bytes=20_000, expect_pages=None):
    path = Path(path)
    problems = []
    if not path.exists():
        return [f"{path} was not created"]
    if path.stat().st_size < min_bytes:
        problems.append(f"only {path.stat().st_size:,} bytes")

    header = path.open("rb").read(5)
    if header != b"%PDF-":
        problems.append("file is not a PDF")

    if expect_pages is not None:
        pages = path.read_bytes().count(b"/Type /Page") or None
        if pages and abs(pages - expect_pages) > 1:
            problems.append(f"{pages} page(s), expected about {expect_pages}")
    return problems

print(check_pdf("pdf/march.pdf", expect_pages=4) or "PDF looks fine")

The magic-number check is worth the two lines: a failed conversion sometimes leaves a zero-length file and sometimes leaves an error message with a .pdf extension, and only the header distinguishes them. Page counting by scanning for the object marker is approximate, which is why the comparison allows a page either way — it is there to catch a report that collapsed to one page or exploded to forty, not to assert an exact length.

PDF is a decision about editing

Choosing PDF over a workbook is choosing that the numbers should not move. That makes it the right format for anything final — a board pack, a signed-off month end, a document sent outside the organisation — and the wrong one for figures a reader is expected to explore. Sending both, when the audience is mixed, is usually better than compromising on either.

Test the conversion on the real report

A converter verified on a two-row test sheet proves very little. Page breaks, column overflow, repeated headers and print areas only misbehave at realistic size, so the conversion step deserves a run against a full month's workbook before it is scheduled. That single test catches the layout problems that would otherwise be discovered by whoever opens the first real PDF.

Fail where the cause is

The most useful place for a check is as close as possible to the thing that can go wrong: the sheet name at the read, the column list before the transform, the row count before the write, the file size before delivery. Each of those turns a confusing downstream error into a message naming the actual problem. Checks placed late still catch the failure, but they describe a symptom — and a symptom three stages from its cause is what makes a simple mistake take an afternoon.

Key takeaways

No pure-pip library renders .xlsx to PDF. openpyxl and pandas write spreadsheets and data; anything advertising df.to_pdf() is a red flag. You need LibreOffice, Excel, or reportlab.
LibreOffice headless is the portable default. It is free, cross-platform, renders the real workbook with styles and charts, and runs unattended — the only cost is installing LibreOffice on the box.
Excel COM is pixel-perfect but Windows-only. Reach for it via xlwings only where Excel is licensed and installed; never anchor a Linux server pipeline to it.
reportlab needs no external program but rebuilds the layout from your data — best for self-contained, data-only summaries in locked-down environments.
Set the page layout first. Configure orientation, fitToWidth with fitToPage, print area, and title rows in openpyxl so the output paginates cleanly instead of splitting columns across sheets.
Harden the scheduled run. Give each headless conversion a throwaway -env:UserInstallation profile, resolve soffice with shutil.which, set a timeout, and treat both a non-zero exit and a missing output file as failures.

Frequently asked questions

Can openpyxl or pandas export a PDF directly? No. openpyxl writes .xlsx files and pandas handles tabular data; neither renders a workbook to PDF. You need LibreOffice, Excel, or a PDF library like reportlab. Treat any claim otherwise as a red flag.

Which method should I default to? LibreOffice headless. It is free, cross-platform, renders the real workbook faithfully, and runs unattended on a server — the only cost is installing LibreOffice. Reach for Excel COM only when you specifically need Excel's pixel-perfect output on Windows, and reportlab when you want zero external programs and control the layout yourself.

My PDF splits columns across two pages — how do I stop it? Set the page layout in the workbook before converting: ws.page_setup.orientation = "landscape", ws.page_setup.fitToWidth = 1, and ws.sheet_properties.pageSetUpPr = PageSetupProperties(fitToPage=True). fitToWidth is ignored unless fitToPage is also enabled.

LibreOffice runs fine locally but hangs or fails under cron — why? Almost always a locked profile or PATH issue. Use a throwaway -env:UserInstallation profile per run so a running instance does not block it, resolve soffice with shutil.which rather than assuming it is on the cron PATH, and set a timeout so a hung process fails loudly.

Can I convert a single sheet instead of the whole workbook? LibreOffice and Excel export every visible sheet. To export one sheet, either hide the others before converting or load just that sheet and use the reportlab approach, which reads whichever sheet you name.

Parent: Automating Reporting Workflows — the end-to-end pipeline this export step belongs to.
Step-by-step recipe: Convert an Excel File to PDF with Python — the portable LibreOffice conversion walked through end to end.
Next in the pipeline: Emailing Excel Reports with smtplib attaches and sends the finished PDF, and Scheduling Python Excel Scripts with Cron runs the whole generate-convert-send job unattended.
Upstream stages: assemble the workbook as a multi-sheet dashboard or generate it from a template, and polish its look with Formatting and Charting Excel Reports with Python.
Related tooling: Automating Excel With xlwings: The Basics — the COM wrapper behind the pixel-perfect export method.