Guide

Automating Reporting WorkflowsDeep dive

Convert an Excel File to PDF with Python

Convert .xlsx to PDF in Python with LibreOffice headless and subprocess: set page layout via openpyxl, run soffice safely with a timeout, plus a Windows COM path.

This is the portable, do-it-once recipe for turning an Excel workbook into a PDF from Python — the practical companion to Exporting Excel Reports to PDF, which compares the methods. The default here is headless LibreOffice driven through subprocess, because it runs the same on Linux, macOS, and Windows, costs nothing, and renders the real workbook including styles and charts. Remember the hard truth: no pure-pip library converts .xlsx to PDF faithfully, so the conversion step always shells out to a renderer. You will build a sample workbook, set its page layout so the PDF paginates cleanly, run the conversion safely, and find the output — with a Windows-only Excel alternative at the end.

Prerequisites

  • Python 3 with openpyxl: pip install openpyxl.
  • LibreOffice installed, with the soffice binary reachable. On Debian/Ubuntu, sudo apt install libreoffice-calc; on macOS, brew install --cask libreoffice; on Windows, the standard installer. LibreOffice is not a pip package — it is a separate program your script calls.

Verify the binary is found before going further:

Bash
soffice --version    # or: libreoffice --version

If that prints a version, you are ready. If not, note where LibreOffice installed and add it to your PATH.

Step 1 — Create a sample workbook with clean page layout

So the script stands alone, generate a small .xlsx. The important part is the page setup: configure orientation, fit-to-width, and print area before converting, or wide tables fragment across pages in the PDF.

Python
from openpyxl import Workbook
from openpyxl.worksheet.properties import PageSetupProperties

wb = Workbook()
ws = wb.active
ws.title = "Sales"

ws.append(["Region", "Q1", "Q2", "Q3", "Q4", "FY Total", "YoY %", "Owner"])
for i in range(1, 26):
    ws.append([f"Branch {i}", 100 + i, 118 + i, 96 + i, 131 + i,
               445 + 4 * i, 2.7, "team-a"])

# Landscape gives wide tables room.
ws.page_setup.orientation = "landscape"

# Scale all columns to ONE page wide; height flows over pages.
ws.page_setup.fitToWidth = 1
ws.page_setup.fitToHeight = 0
ws.sheet_properties.pageSetUpPr = PageSetupProperties(fitToPage=True)

# Only export the data region, and repeat the header on each page.
ws.print_area = "A1:H26"
ws.print_title_rows = "1:1"

wb.save("sales_report.xlsx")
print("Wrote sales_report.xlsx")

The fitToWidth = 1 line only takes effect because pageSetUpPr=PageSetupProperties(fitToPage=True) is set alongside it. Without that flag the renderer ignores the fit and the table spills across two page-widths.

Step 2 — Convert with LibreOffice via subprocess

Now call soffice --headless --convert-to pdf. Resolve the binary with shutil.which, give the run a timeout, check returncode, and confirm the output file actually appeared — a clean exit alone is not proof of success.

Python
import shutil
import subprocess
from pathlib import Path

def convert_to_pdf(xlsx_path, out_dir=None, timeout=120):
    """Convert an .xlsx to PDF using headless LibreOffice.

    Requires LibreOffice installed with `soffice` on PATH.
    Returns the Path of the generated PDF.
    """
    src = Path(xlsx_path).resolve()
    if not src.is_file():
        raise FileNotFoundError(f"Input not found: {src}")

    soffice = shutil.which("soffice") or shutil.which("libreoffice")
    if soffice is None:
        raise RuntimeError(
            "LibreOffice not found. Install it and ensure 'soffice' is on PATH.")

    out_dir = Path(out_dir or src.parent).resolve()
    out_dir.mkdir(parents=True, exist_ok=True)

    try:
        result = subprocess.run(
            [soffice, "--headless", "--convert-to", "pdf",
             "--outdir", str(out_dir), str(src)],
            capture_output=True, text=True, timeout=timeout,
        )
    except subprocess.TimeoutExpired:
        raise RuntimeError(
            f"LibreOffice timed out after {timeout}s converting {src.name}")

    if result.returncode != 0:
        raise RuntimeError(
            f"LibreOffice exited {result.returncode}: {result.stderr.strip()}")

    pdf_path = out_dir / (src.stem + ".pdf")
    if not pdf_path.is_file():
        raise RuntimeError(
            f"Exit 0 but no PDF at {pdf_path}. stdout: {result.stdout.strip()}")
    return pdf_path

if __name__ == "__main__":
    pdf = convert_to_pdf("sales_report.xlsx")
    print("Created", pdf, f"({pdf.stat().st_size} bytes)")

Step 3 — Locate and verify the output

LibreOffice names the PDF after the input stem and drops it in --outdir. So sales_report.xlsx becomes sales_report.pdf in the same folder unless you pass a different out_dir. The function returns that Path; checking pdf.stat().st_size is a cheap sanity test that the file has real content before you email or archive it.

Python
pdf = convert_to_pdf("sales_report.xlsx", out_dir="output")
print("PDF ready:", pdf)          # output/sales_report.pdf
assert pdf.stat().st_size > 0

Pitfalls and fixes

SymptomCauseFix
RuntimeError: LibreOffice not foundsoffice not on the script's PATH (common under cron)Resolve with shutil.which; add the install dir to PATH, or pass the absolute binary path.
Conversion hangs or produces no fileAnother LibreOffice instance holds the user profile lockPass -env:UserInstallation=file:///tmp/lo_profile_xyz to use a separate, throwaway profile dir per run.
Columns split across two page-widthsSheet not fit to page before convertingSet ws.page_setup.fitToWidth = 1 and pageSetUpPr=PageSetupProperties(fitToPage=True).
TimeoutExpired raisedLarge workbook, or a hung headless processRaise timeout, or kill and retry once; never let it block a scheduled job forever.
Garbled / missing text on a serverFonts used by the report are not installedInstall the needed font packages on the headless box (e.g. the relevant fonts-* packages on Linux).
Exit 0 but empty/blank PDFCorrupt or zero-byte input workbookValidate the input exists and is non-empty before converting.

The locked-profile case is the most common production surprise. The fix is one extra argument that points LibreOffice at a fresh profile, so headless runs never collide with each other or with a desktop instance:

Python
import tempfile, shutil, subprocess
from pathlib import Path

def convert_isolated(xlsx_path, out_dir="output", timeout=120):
    """LibreOffice conversion with a throwaway profile dir — safe under cron."""
    soffice = shutil.which("soffice") or shutil.which("libreoffice")
    if soffice is None:
        raise RuntimeError("install LibreOffice; 'soffice' not on PATH")
    src = Path(xlsx_path).resolve()
    out = Path(out_dir).resolve()
    out.mkdir(parents=True, exist_ok=True)
    with tempfile.TemporaryDirectory() as profile:
        r = subprocess.run(
            [soffice, f"-env:UserInstallation=file://{profile}",
             "--headless", "--convert-to", "pdf",
             "--outdir", str(out), str(src)],
            capture_output=True, text=True, timeout=timeout,
        )
    if r.returncode != 0:
        raise RuntimeError(r.stderr.strip())
    pdf = out / (src.stem + ".pdf")
    if not pdf.is_file():
        raise RuntimeError("no PDF produced")
    return pdf

A note on running headless on a server

On a server with no display the --headless flag is enough — you do not need xvfb for --convert-to. What you do need is the fonts your reports use installed system-wide, and an isolated profile per run as shown above. With those two things in place the same code that works on your laptop works under cron.

Short Windows-only alternative

If you are on Windows with Excel installed and want pixel-perfect output, let Excel render it via xlwings. This only works where Excel is licensed and installed — not on a Linux server.

Python
# Windows + Microsoft Excel only.  pip install xlwings
import xlwings as xw

def convert_with_excel(xlsx_path, pdf_path):
    app = xw.App(visible=False)
    try:
        wb = app.books.open(xlsx_path)
        wb.to_pdf(pdf_path)      # wraps Excel's ExportAsFixedFormat
        wb.close()
    finally:
        app.quit()

# convert_with_excel("sales_report.xlsx", "sales_report.pdf")

Frequently asked questions

Do I have to install LibreOffice — can't pip do it? You have to install LibreOffice (or use Excel/reportlab). No pip package renders an .xlsx to PDF; the LibreOffice approach shells out to the installed program through subprocess.

Where does the PDF end up? In the directory passed as --outdir, named after the input file's stem: report.xlsx to report.pdf. The helper returns that exact Path so you do not have to guess.

Why check returncode and the output file when there is a return code? LibreOffice can exit 0 while writing nothing on certain malformed inputs. Verifying the file exists and is non-empty turns a silent failure into a clear error your job can act on.

The conversion works locally but not under cron — what changed? The cron environment has a minimal PATH and may collide with a running LibreOffice. Resolve soffice with shutil.which, and run each conversion with its own -env:UserInstallation profile directory.

Can I convert several workbooks at once? Yes — call the function in a loop, or pass multiple file arguments to one soffice command. For volume, a fresh profile per batch keeps runs from interfering with each other.

Conclusion

Converting Excel to PDF in Python is two parts: prepare the page layout with openpyxl so the output paginates well, then shell out to a real renderer. Headless LibreOffice through subprocess.run — with a resolved binary, a timeout, a returncode check, an output-file check, and an isolated profile — is the portable default that works from your laptop to a cron job. On Windows with Excel, the COM path via xlwings gives pixel-perfect results when you need them.

Where to go next

Step back to the overview and method comparison in Exporting Excel Reports to PDF. Then wire this into a pipeline: send the finished PDF with Emailing Excel Reports with smtplib, and run the whole generate-convert-send job unattended with Scheduling Python Excel Scripts with Cron.