Guide

Automating Reporting WorkflowsDeep dive

Exporting Excel Reports to PDF

Convert .xlsx reports to PDF from Python the honest way: LibreOffice headless, Excel COM, or reportlab — plus openpyxl page setup so the output paginates cleanly.

PDF is the format people actually open. It renders identically on every machine, it cannot be accidentally edited, and it prints predictably — which is why the final step of most reporting jobs is converting the workbook to PDF before it is emailed or filed. The catch is that no pure-pip Python library renders an .xlsx to PDF faithfully. openpyxl writes spreadsheets but cannot export them; pandas reads and writes data, not page layout. Anyone who tells you df.to_pdf() exists is wrong. This page lays out the three approaches that genuinely work, when to reach for each, and the openpyxl page-setup step that makes the difference between a clean one-page report and a mess that spills columns across six sheets. It is the export stage of the broader Automating Reporting Workflows pipeline.

The three real options

There is no single best method — the right one depends on your platform and how much fidelity you need.

MethodPlatformFidelityDependenciesBest for
LibreOffice headless (soffice --convert-to pdf)Linux / macOS / WindowsHigh — renders the real workbook, styles, chartsLibreOffice installed (not pip)Servers, cron jobs, CI; the portable default
Excel COM via xlwings / pywin32Windows + Excel onlyPixel-perfect — Excel renders itMicrosoft Excel + pip install xlwingsDesktops where Excel is already licensed
reportlab (build the PDF directly)AnyTotal control, but you rebuild the layoutpip install reportlabWhen you control the design and skip Excel rendering

The first two render an existing workbook. The third skips the workbook entirely and draws the PDF from your data — more work, but no Excel or LibreOffice required. For the step-by-step portable recipe, see Convert an Excel File to PDF with Python.

Set up the page before you convert

The single biggest source of ugly PDFs is converting a workbook that was never told how to print. A wide sheet paginates into a left half and a right half on separate pages; a tall sheet breaks columns awkwardly. Fix this in the workbook with openpyxl's page-setup attributes before you hand the file to any converter — every method above respects them.

Python
from openpyxl import Workbook
from openpyxl.worksheet.properties import PageSetupProperties

wb = Workbook()
ws = wb.active
ws.title = "Summary"

# Some sample content wide enough to need fitting.
ws.append(["Region", "Q1", "Q2", "Q3", "Q4", "FY Total", "YoY %", "Notes"])
for r in range(1, 31):
    ws.append([f"Branch {r}", 100 + r, 120 + r, 95 + r, 130 + r,
               445 + 4 * r, 3.2, "on track"])

# Landscape so wide tables get more horizontal room.
ws.page_setup.orientation = "landscape"

# Fit ALL columns onto one page width; let height flow over pages.
ws.page_setup.fitToWidth = 1
ws.page_setup.fitToHeight = 0           # 0 = unlimited pages tall
ws.sheet_properties.pageSetUpPr = PageSetupProperties(fitToPage=True)

# Restrict the export to the data region (skip stray cells far to the right).
ws.print_area = "A1:H31"

# Repeat the header row on every printed page.
ws.print_title_rows = "1:1"

wb.save("report.xlsx")
print("Saved report.xlsx with print layout configured")

The crucial pairing is fitToWidth = 1 and pageSetUpPr = PageSetupProperties(fitToPage=True). Setting fitToWidth alone does nothing — Excel and LibreOffice ignore it unless the fitToPage flag on the sheet properties is also on. With both set, the converter scales the columns down to one page wide while letting rows flow over as many pages as needed.

Method 1 — LibreOffice headless (the portable default)

LibreOffice ships a headless mode that opens the workbook, renders it exactly as the desktop app would, and writes a PDF — with no GUI. It runs on Linux, macOS, and Windows, costs nothing, and handles styles and charts well, which makes it the right default for any unattended job. Drive it from Python with subprocess. The binary is soffice (or libreoffice on some Linux distros).

Python
import shutil
import subprocess
from pathlib import Path

def xlsx_to_pdf_libreoffice(xlsx_path, out_dir=None, timeout=120):
    """Convert an .xlsx to PDF via headless LibreOffice.

    Requires LibreOffice installed and `soffice` reachable on PATH.
    Returns the path to the generated PDF.
    """
    src = Path(xlsx_path).resolve()
    if not src.is_file():
        raise FileNotFoundError(src)

    soffice = shutil.which("soffice") or shutil.which("libreoffice")
    if soffice is None:
        raise RuntimeError("LibreOffice not found; install it and ensure "
                           "'soffice' is on PATH")

    out_dir = Path(out_dir or src.parent).resolve()
    out_dir.mkdir(parents=True, exist_ok=True)

    result = subprocess.run(
        [soffice, "--headless", "--convert-to", "pdf",
         "--outdir", str(out_dir), str(src)],
        capture_output=True, text=True, timeout=timeout,
    )
    if result.returncode != 0:
        raise RuntimeError(
            f"LibreOffice failed (exit {result.returncode}): {result.stderr}")

    pdf_path = out_dir / (src.stem + ".pdf")
    if not pdf_path.is_file():
        raise RuntimeError(f"Conversion reported success but {pdf_path} "
                           f"is missing. stdout: {result.stdout}")
    return pdf_path

# pdf = xlsx_to_pdf_libreoffice("report.xlsx")
# print("Wrote", pdf)

Always check returncode and that the expected output file exists — LibreOffice occasionally exits zero while writing nothing if the input is corrupt. The timeout guards against a hung process in a scheduled job; if it fires, subprocess.run raises TimeoutExpired, which your job wrapper should catch and log.

Method 2 — Excel COM on Windows

On a Windows box with Excel installed, you can ask Excel itself to export — the result is pixel-perfect because it is Excel's own renderer. xlwings wraps the COM API cleanly:

Python
# Windows + Microsoft Excel only.  pip install xlwings
import xlwings as xw

def xlsx_to_pdf_excel(xlsx_path, pdf_path):
    """Pixel-perfect export using Excel's own engine. Windows + Excel only."""
    app = xw.App(visible=False)
    try:
        wb = app.books.open(xlsx_path)
        wb.to_pdf(pdf_path)        # xlwings >= 0.21 convenience wrapper
        wb.close()
    finally:
        app.quit()

# xlsx_to_pdf_excel("report.xlsx", "report.pdf")

Under the hood to_pdf() calls ExportAsFixedFormat on the COM workbook. This is the highest-fidelity option, but it only runs where Excel is licensed and installed — it is a non-starter on a Linux server, so do not build a server pipeline around it.

Method 3 — Build the PDF directly with reportlab

When you control the layout and do not need the workbook's exact styling, skip Excel rendering entirely and draw the PDF from your data with reportlab. This needs no Excel and no LibreOffice — just a pip install — which is appealing for locked-down environments.

Python
# pip install reportlab openpyxl
from openpyxl import load_workbook
from reportlab.lib.pagesizes import landscape, A4
from reportlab.lib import colors
from reportlab.platypus import SimpleDocTemplate, Table, TableStyle

def sheet_to_pdf_reportlab(xlsx_path, pdf_path, sheet=None):
    """Read a sheet's cells and lay them out as a PDF table. No Excel needed."""
    wb = load_workbook(xlsx_path, data_only=True)
    ws = wb[sheet] if sheet else wb.active
    rows = [[("" if c is None else c) for c in row]
            for row in ws.iter_rows(values_only=True)]

    doc = SimpleDocTemplate(pdf_path, pagesize=landscape(A4))
    table = Table(rows, repeatRows=1)
    table.setStyle(TableStyle([
        ("BACKGROUND", (0, 0), (-1, 0), colors.HexColor("#1f4e78")),
        ("TEXTCOLOR", (0, 0), (-1, 0), colors.white),
        ("FONTSIZE", (0, 0), (-1, -1), 8),
        ("GRID", (0, 0), (-1, -1), 0.25, colors.grey),
    ]))
    doc.build([table])

# sheet_to_pdf_reportlab("report.xlsx", "report.pdf")

You trade fidelity for control and zero external dependencies. Charts, merged-cell layouts, and conditional formatting do not come across — you rebuild whatever you want on the page yourself. For data-only summaries this is fast and fully self-contained.

Automating the conversion in a scheduled job

In production the conversion is one step in an unattended pipeline: generate the workbook, convert it, then email it. For LibreOffice headless this means two precautions. First, give each run its own user-profile directory so a desktop LibreOffice instance (or a previous run) cannot lock the conversion. Second, set a timeout and treat both a non-zero exit and a missing output file as failures your scheduler will alert on.

Python
import tempfile, shutil, subprocess
from pathlib import Path

def convert_isolated(xlsx_path, out_dir, timeout=120):
    """LibreOffice conversion with a throwaway profile dir — safe under cron."""
    soffice = shutil.which("soffice") or shutil.which("libreoffice")
    if soffice is None:
        raise RuntimeError("install LibreOffice; 'soffice' not on PATH")
    src = Path(xlsx_path).resolve()
    out_dir = Path(out_dir).resolve()
    out_dir.mkdir(parents=True, exist_ok=True)

    with tempfile.TemporaryDirectory() as profile:
        result = subprocess.run(
            [soffice, f"-env:UserInstallation=file://{profile}",
             "--headless", "--convert-to", "pdf",
             "--outdir", str(out_dir), str(src)],
            capture_output=True, text=True, timeout=timeout,
        )
    if result.returncode != 0:
        raise RuntimeError(f"convert failed: {result.stderr}")
    pdf = out_dir / (src.stem + ".pdf")
    if not pdf.is_file():
        raise RuntimeError("no PDF produced")
    return pdf

The -env:UserInstallation flag is the key line — it points LibreOffice at a fresh profile per invocation, which is what makes headless conversion reliable when another instance might be running. On a headless server, also install the font packages your reports use, or text falls back to substitutes and the PDF looks wrong.

Frequently asked questions

Can openpyxl or pandas export a PDF directly? No. openpyxl writes .xlsx files and pandas handles tabular data; neither renders a workbook to PDF. You need LibreOffice, Excel, or a PDF library like reportlab. Treat any claim otherwise as a red flag.

Which method should I default to? LibreOffice headless. It is free, cross-platform, renders the real workbook faithfully, and runs unattended on a server — the only cost is installing LibreOffice. Reach for Excel COM only when you specifically need Excel's pixel-perfect output on Windows, and reportlab when you want zero external programs and control the layout yourself.

My PDF splits columns across two pages — how do I stop it? Set the page layout in the workbook before converting: ws.page_setup.orientation = "landscape", ws.page_setup.fitToWidth = 1, and ws.sheet_properties.pageSetUpPr = PageSetupProperties(fitToPage=True). fitToWidth is ignored unless fitToPage is also enabled.

LibreOffice runs fine locally but hangs or fails under cron — why? Almost always a locked profile or PATH issue. Use a throwaway -env:UserInstallation profile per run so a running instance does not block it, resolve soffice with shutil.which rather than assuming it is on the cron PATH, and set a timeout so a hung process fails loudly.

Can I convert a single sheet instead of the whole workbook? LibreOffice and Excel export every visible sheet. To export one sheet, either hide the others before converting or load just that sheet and use the reportlab approach, which reads whichever sheet you name.

Conclusion

There is no magic pip package that turns a spreadsheet into a faithful PDF — but there are three solid paths. LibreOffice headless is the portable default for unattended jobs, Excel COM gives pixel-perfect output where Excel is installed, and reportlab builds the document directly when you control the design. Whichever you pick, configure the page layout with openpyxl first — orientation, fit-to-width, and print area — so the result paginates cleanly instead of fragmenting. Then wrap the conversion with a timeout, an isolated profile, and explicit output-file checks so it survives scheduling.

Where to go next

Work through the portable recipe end to end in Convert an Excel File to PDF with Python. Then connect this step to the rest of the pipeline: Emailing Excel Reports with smtplib sends the finished PDF, and Scheduling Python Excel Scripts with Cron runs the whole generate-convert-send job unattended. For the upstream stages, return to Automating Reporting Workflows or polish the workbook first with Formatting and Charting Excel Reports with Python.