Guide
Scheduling Python Excel Scripts with Cron
Run Python Excel report scripts unattended with cron: write a robust generator, decode crontab syntax, invoke the venv Python directly, log, and prevent overlap.
A report that runs only when you remember to run it isn't automated. On Linux and macOS, cron is the simplest way to trigger a Python script at a fixed time with no extra infrastructure. The catch is that cron runs your script in a stripped-down environment — minimal PATH, no virtualenv activation, home directory as the working directory — so a script that works in your shell can fail silently under cron. This page covers a report script that survives that environment, the crontab syntax to schedule it, and the safeguards that keep it reliable. It is the scheduling stage of the broader Automating Reporting Workflows pipeline.
What cron does and doesn't give you
Before writing the schedule, internalize what changes between your shell and cron:
- Cron does not source
.bashrcor.profile, so it does not activate your virtualenv or pick up customPATHentries. - The working directory is the user's home, not the script's location, so relative paths resolve somewhere you didn't intend.
stdoutandstderrare discarded unless you redirect them, so an uncaught error vanishes without a trace.
Every recommendation below is a direct response to one of these facts: invoke the interpreter by absolute path, use absolute file paths, and log to a known file.
A report script that runs unattended
The script below uses absolute paths, creates its own directories, logs to a dated file, and exits non-zero on failure so a monitor can detect it. It is self-contained: if the source CSV is missing it writes a small sample so the run still demonstrates the full path. Save it as /opt/reporting/generate_daily_report.py:
#!/usr/bin/env python3
"""Scheduled Excel report generator: load data, summarize, write .xlsx."""
import sys
import logging
from datetime import datetime
from pathlib import Path
import pandas as pd
# Absolute paths — cron's working directory is not the script's directory.
BASE_DIR = Path("/tmp/reporting_demo") # use /opt/reporting in production
DATA_DIR = BASE_DIR / "data"
OUTPUT_DIR = BASE_DIR / "output"
LOG_DIR = BASE_DIR / "logs"
for d in (DATA_DIR, OUTPUT_DIR, LOG_DIR):
d.mkdir(parents=True, exist_ok=True)
log_file = LOG_DIR / f"report_{datetime.now():%Y%m%d}.log"
logging.basicConfig(
filename=log_file,
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(message)s",
datefmt="%Y-%m-%d %H:%M:%S",
)
def main():
logging.info("Starting daily Excel report generation.")
try:
source_file = DATA_DIR / "transactions.csv"
if not source_file.exists():
# Seed a sample so the run is self-contained for this demo.
pd.DataFrame({
"category": ["A", "B", "A", "C", "B"],
"amount": [100.0, 250.5, 75.25, 40.0, 250.5],
}).to_csv(source_file, index=False)
logging.info("Source missing; wrote sample data.")
df = pd.read_csv(source_file)
logging.info("Loaded %d records from %s", len(df), source_file)
summary = df.groupby("category", as_index=False)["amount"].sum()
output_file = OUTPUT_DIR / f"daily_summary_{datetime.now():%Y%m%d}.xlsx"
summary.to_excel(output_file, index=False, engine="openpyxl")
logging.info("Report saved to %s", output_file)
except Exception:
logging.exception("Report generation failed.") # captures traceback
sys.exit(1)
logging.info("Execution completed successfully.")
if __name__ == "__main__":
main()
Make it executable so it can be invoked directly:
chmod +x /opt/reporting/generate_daily_report.py
Reading and writing crontab syntax
A crontab line is five time fields followed by the command:
┌───────── minute (0–59)
│ ┌─────── hour (0–23)
│ │ ┌───── day of month (1–31)
│ │ │ ┌─── month (1–12)
│ │ │ │ ┌─ day of week (0–7, where 0 and 7 are both Sunday)
│ │ │ │ │
* * * * * command to run
A few patterns cover most reporting needs:
0 6 * * * # every day at 06:00
30 7 * * 1-5 # 07:30 on weekdays (Mon–Fri)
0 9 1 * * # 09:00 on the first day of every month
0 */4 * * * # every 4 hours, on the hour
0 22 * * 0 # 22:00 every Sunday
Edit your schedule with crontab -e and add the entry. Invoke the virtualenv's Python by absolute path — this sidesteps the fact that cron never activated the environment — and redirect output to a log so interpreter-level errors (a missing module, a syntax error) are captured before your in-script logging starts:
# Run daily at 06:00 using the venv interpreter directly
0 6 * * * /opt/reporting/venv/bin/python3 /opt/reporting/generate_daily_report.py >> /opt/reporting/logs/cron_stdout.log 2>&1
Calling /opt/reporting/venv/bin/python3 directly is more reliable than wrapping the command in bash -c 'source venv/bin/activate && ...': it guarantees the exact interpreter and installed packages, with no dependency on shell initialization.
Test it the way cron will run it
Bugs from the cron environment won't show up in your normal shell, where the venv is active and you're in the project directory. Reproduce cron's bare environment with env -i before trusting the schedule:
# Run with an empty environment, like cron does
env -i /opt/reporting/venv/bin/python3 /opt/reporting/generate_daily_report.py
echo "exit code: $?"
If it works here, it will work under cron. Then confirm the schedule is installed with crontab -l.
Prevent overlapping runs
If a run can take longer than its interval, cron will start a second copy on top of the first, and the two can fight over the same output file. The simplest guard is flock, which wraps the command in a lock the kernel releases automatically when the process exits:
# -n: fail immediately if the lock is held (skip this run rather than queue it)
0 * * * * /usr/bin/flock -n /tmp/report.lock /opt/reporting/venv/bin/python3 /opt/reporting/generate_daily_report.py >> /opt/reporting/logs/cron_stdout.log 2>&1
flock needs no code changes and is the recommended approach. If you prefer to handle it inside Python, the fcntl module offers flock on a lock file, but the shell wrapper is simpler and just as safe.
Timezones
Cron uses the system timezone, which on a server is often UTC and may not match your reporting window. Check it with timedatectl, then either translate your target time into the system zone or pin a timezone at the top of the crontab so the schedule is unambiguous:
TZ=America/New_York
0 6 * * * /opt/reporting/venv/bin/python3 /opt/reporting/generate_daily_report.py >> /opt/reporting/logs/cron_stdout.log 2>&1
The Windows equivalent
Windows has no cron; use Task Scheduler, scriptable via schtasks. The trigger fields differ but the principle is identical — call the venv's Python by absolute path against the script. This registers a daily 06:00 task:
schtasks /Create /SC DAILY /ST 06:00 /TN "DailyExcelReport" ^
/TR "C:\reporting\venv\Scripts\python.exe C:\reporting\generate_daily_report.py"
Use /SC WEEKLY /D MON,TUE,WED,THU,FRI for weekdays, or /SC HOURLY /MO 4 for every four hours. Run an elevated prompt if the task needs to run whether or not the user is logged on (add /RU with a service account and /RL HIGHEST).
Common failures and fixes
ModuleNotFoundErrorunder cron, but not in your shell. Cron didn't activate the venv. Invoke/path/to/venv/bin/python3directly instead ofpython3.FileNotFoundErrorfor a file that exists. A relative path resolved against the home directory. Use absolute paths for every read and write.- The job appears to run but produces nothing, with no error. Output went to the void. Redirect cron output with
>> logfile 2>&1and keep your in-script logging to a file. - Two copies clobbering each other. A run outlasted its interval. Wrap the command in
flock -n. - Ran at the wrong time. System timezone differs from your intent. Set
TZ=in the crontab or convert your time to the system zone.
Validate after deployment
- Logs —
tail -f /opt/reporting/logs/report_YYYYMMDD.logshould show load, transform, and save messages. - File — open the generated
.xlsxand confirm the expected rows and columns are present. - Exit code — a non-zero exit (from
sys.exit(1)) is what monitoring keys off; thecron_stdout.logredirect captures any error cron would otherwise mail or discard.
Conclusion
The four practices that make a cron job reliable all trace back to the same root cause — cron's stripped-down environment. Invoke the virtualenv Python by absolute path, use absolute file paths throughout the script, redirect both stdout and stderr to a log file, and test with env -i before trusting the schedule. Add flock -n for any job that can outlast its interval. Everything else is application logic.
Where to go next
Scheduling handles when the report runs; the next stage is getting it out the door. Add delivery to the same job with Emailing Excel Reports with smtplib, and revisit the full pipeline — ingest, transform, generate, deliver — in Automating Reporting Workflows.