Guide

Automating Reporting WorkflowsDeep dive

Scheduling Python Excel Scripts with Cron

Run Python Excel report scripts unattended with cron: write a robust generator, decode crontab syntax, invoke the venv Python directly, log, and prevent overlap.

A report that runs only when you remember to run it isn't automated. On Linux and macOS, cron is the simplest way to trigger a Python script at a fixed time with no extra infrastructure. The catch is that cron runs your script in a stripped-down environment — minimal PATH, no virtualenv activation, home directory as the working directory — so a script that works in your shell can fail silently under cron. This page covers a report script that survives that environment, the crontab syntax to schedule it, and the safeguards that keep it reliable. It is the scheduling stage of the broader Automating Reporting Workflows pipeline.

What cron does and doesn't give you

Before writing the schedule, internalize what changes between your shell and cron:

  • Cron does not source .bashrc or .profile, so it does not activate your virtualenv or pick up custom PATH entries.
  • The working directory is the user's home, not the script's location, so relative paths resolve somewhere you didn't intend.
  • stdout and stderr are discarded unless you redirect them, so an uncaught error vanishes without a trace.

Every recommendation below is a direct response to one of these facts: invoke the interpreter by absolute path, use absolute file paths, and log to a known file.

A report script that runs unattended

The script below uses absolute paths, creates its own directories, logs to a dated file, and exits non-zero on failure so a monitor can detect it. It is self-contained: if the source CSV is missing it writes a small sample so the run still demonstrates the full path. Save it as /opt/reporting/generate_daily_report.py:

Python
#!/usr/bin/env python3
"""Scheduled Excel report generator: load data, summarize, write .xlsx."""
import sys
import logging
from datetime import datetime
from pathlib import Path

import pandas as pd

# Absolute paths — cron's working directory is not the script's directory.
BASE_DIR = Path("/tmp/reporting_demo")   # use /opt/reporting in production
DATA_DIR = BASE_DIR / "data"
OUTPUT_DIR = BASE_DIR / "output"
LOG_DIR = BASE_DIR / "logs"
for d in (DATA_DIR, OUTPUT_DIR, LOG_DIR):
    d.mkdir(parents=True, exist_ok=True)

log_file = LOG_DIR / f"report_{datetime.now():%Y%m%d}.log"
logging.basicConfig(
    filename=log_file,
    level=logging.INFO,
    format="%(asctime)s | %(levelname)s | %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)

def main():
    logging.info("Starting daily Excel report generation.")
    try:
        source_file = DATA_DIR / "transactions.csv"
        if not source_file.exists():
            # Seed a sample so the run is self-contained for this demo.
            pd.DataFrame({
                "category": ["A", "B", "A", "C", "B"],
                "amount": [100.0, 250.5, 75.25, 40.0, 250.5],
            }).to_csv(source_file, index=False)
            logging.info("Source missing; wrote sample data.")

        df = pd.read_csv(source_file)
        logging.info("Loaded %d records from %s", len(df), source_file)

        summary = df.groupby("category", as_index=False)["amount"].sum()

        output_file = OUTPUT_DIR / f"daily_summary_{datetime.now():%Y%m%d}.xlsx"
        summary.to_excel(output_file, index=False, engine="openpyxl")
        logging.info("Report saved to %s", output_file)
    except Exception:
        logging.exception("Report generation failed.")  # captures traceback
        sys.exit(1)
    logging.info("Execution completed successfully.")

if __name__ == "__main__":
    main()

Make it executable so it can be invoked directly:

Bash
chmod +x /opt/reporting/generate_daily_report.py

Reading and writing crontab syntax

A crontab line is five time fields followed by the command:

Text
┌───────── minute        (0–59)
│ ┌─────── hour          (0–23)
│ │ ┌───── day of month  (1–31)
│ │ │ ┌─── month         (1–12)
│ │ │ │ ┌─ day of week   (0–7, where 0 and 7 are both Sunday)
│ │ │ │ │
* * * * *  command to run

A few patterns cover most reporting needs:

Bash
0 6 * * *        # every day at 06:00
30 7 * * 1-5     # 07:30 on weekdays (Mon–Fri)
0 9 1 * *        # 09:00 on the first day of every month
0 */4 * * *      # every 4 hours, on the hour
0 22 * * 0       # 22:00 every Sunday

Edit your schedule with crontab -e and add the entry. Invoke the virtualenv's Python by absolute path — this sidesteps the fact that cron never activated the environment — and redirect output to a log so interpreter-level errors (a missing module, a syntax error) are captured before your in-script logging starts:

Bash
# Run daily at 06:00 using the venv interpreter directly
0 6 * * * /opt/reporting/venv/bin/python3 /opt/reporting/generate_daily_report.py >> /opt/reporting/logs/cron_stdout.log 2>&1

Calling /opt/reporting/venv/bin/python3 directly is more reliable than wrapping the command in bash -c 'source venv/bin/activate && ...': it guarantees the exact interpreter and installed packages, with no dependency on shell initialization.

Test it the way cron will run it

Bugs from the cron environment won't show up in your normal shell, where the venv is active and you're in the project directory. Reproduce cron's bare environment with env -i before trusting the schedule:

Bash
# Run with an empty environment, like cron does
env -i /opt/reporting/venv/bin/python3 /opt/reporting/generate_daily_report.py
echo "exit code: $?"

If it works here, it will work under cron. Then confirm the schedule is installed with crontab -l.

Prevent overlapping runs

If a run can take longer than its interval, cron will start a second copy on top of the first, and the two can fight over the same output file. The simplest guard is flock, which wraps the command in a lock the kernel releases automatically when the process exits:

Bash
# -n: fail immediately if the lock is held (skip this run rather than queue it)
0 * * * * /usr/bin/flock -n /tmp/report.lock /opt/reporting/venv/bin/python3 /opt/reporting/generate_daily_report.py >> /opt/reporting/logs/cron_stdout.log 2>&1

flock needs no code changes and is the recommended approach. If you prefer to handle it inside Python, the fcntl module offers flock on a lock file, but the shell wrapper is simpler and just as safe.

Timezones

Cron uses the system timezone, which on a server is often UTC and may not match your reporting window. Check it with timedatectl, then either translate your target time into the system zone or pin a timezone at the top of the crontab so the schedule is unambiguous:

Bash
TZ=America/New_York
0 6 * * * /opt/reporting/venv/bin/python3 /opt/reporting/generate_daily_report.py >> /opt/reporting/logs/cron_stdout.log 2>&1

The Windows equivalent

Windows has no cron; use Task Scheduler, scriptable via schtasks. The trigger fields differ but the principle is identical — call the venv's Python by absolute path against the script. This registers a daily 06:00 task:

Bat
schtasks /Create /SC DAILY /ST 06:00 /TN "DailyExcelReport" ^
  /TR "C:\reporting\venv\Scripts\python.exe C:\reporting\generate_daily_report.py"

Use /SC WEEKLY /D MON,TUE,WED,THU,FRI for weekdays, or /SC HOURLY /MO 4 for every four hours. Run an elevated prompt if the task needs to run whether or not the user is logged on (add /RU with a service account and /RL HIGHEST).

Common failures and fixes

  • ModuleNotFoundError under cron, but not in your shell. Cron didn't activate the venv. Invoke /path/to/venv/bin/python3 directly instead of python3.
  • FileNotFoundError for a file that exists. A relative path resolved against the home directory. Use absolute paths for every read and write.
  • The job appears to run but produces nothing, with no error. Output went to the void. Redirect cron output with >> logfile 2>&1 and keep your in-script logging to a file.
  • Two copies clobbering each other. A run outlasted its interval. Wrap the command in flock -n.
  • Ran at the wrong time. System timezone differs from your intent. Set TZ= in the crontab or convert your time to the system zone.

Validate after deployment

  1. Logstail -f /opt/reporting/logs/report_YYYYMMDD.log should show load, transform, and save messages.
  2. File — open the generated .xlsx and confirm the expected rows and columns are present.
  3. Exit code — a non-zero exit (from sys.exit(1)) is what monitoring keys off; the cron_stdout.log redirect captures any error cron would otherwise mail or discard.

Conclusion

The four practices that make a cron job reliable all trace back to the same root cause — cron's stripped-down environment. Invoke the virtualenv Python by absolute path, use absolute file paths throughout the script, redirect both stdout and stderr to a log file, and test with env -i before trusting the schedule. Add flock -n for any job that can outlast its interval. Everything else is application logic.

Where to go next

Scheduling handles when the report runs; the next stage is getting it out the door. Add delivery to the same job with Emailing Excel Reports with smtplib, and revisit the full pipeline — ingest, transform, generate, deliver — in Automating Reporting Workflows.