Guide
Schedule Recurring Excel Reports with APScheduler
Schedule recurring Excel reports inside Python with APScheduler: a BlockingScheduler, cron and interval triggers, misfire handling, overlap prevention, and timezones.
OS schedulers like cron and Windows Task Scheduler trigger a fresh process at a fixed time. APScheduler takes the opposite approach: it lives inside a long-running Python process and fires job functions on a schedule you define in code. That makes it cross-platform, timezone-aware, and the natural fit when scheduling is part of an application rather than an OS-level concern. This page schedules a recurring Excel report with APScheduler — a weekday-morning cron trigger and an interval trigger — with the safeguards that keep it reliable. It's the in-process alternative within Scheduling Python Excel Scripts with Cron.
When to choose APScheduler over cron or Task Scheduler
| Choose | When |
|---|---|
| APScheduler | You want one cross-platform schedule in code, timezone-aware triggers, dynamic add/remove of jobs, or scheduling embedded in an app/service you already run |
| cron / Task Scheduler | You want the OS to own the schedule, no resident process to babysit, and each run isolated in its own process |
The key trade-off: APScheduler only runs while its process is alive. Cron and Task Scheduler fire even after a reboot with no resident process. If you choose APScheduler, you'll need a supervisor (systemd, a Windows service, or a container restart policy) to keep the process up — covered below.
Prerequisites
Install the scheduler and the report libraries:
pip install apscheduler pandas openpyxl
This page targets APScheduler 3.x, the current stable line. The job function below is plain Python — anything that builds and writes an .xlsx works.
A BlockingScheduler with a cron trigger
A BlockingScheduler runs the scheduler in the foreground and blocks the calling thread — ideal when the script's only purpose is to schedule reports. The job builds sample data and writes daily_summary.xlsx. Save this as report_scheduler.py:
"""Recurring Excel reports via APScheduler. Runs as a long-lived process."""
import logging
from datetime import datetime
from pathlib import Path
import pandas as pd
from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.triggers.cron import CronTrigger
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s | %(levelname)s | %(message)s",
)
OUTPUT_DIR = Path("reports")
OUTPUT_DIR.mkdir(exist_ok=True)
def generate_report():
"""Build data and write the Excel file. One scheduled run = one call."""
logging.info("Generating report...")
df = pd.DataFrame({
"region": ["North", "South", "North", "East", "South"],
"revenue": [1200.0, 980.5, 1450.0, 610.25, 980.5],
})
summary = df.groupby("region", as_index=False)["revenue"].sum()
stamp = datetime.now().strftime("%Y%m%d_%H%M")
out = OUTPUT_DIR / f"daily_summary_{stamp}.xlsx"
summary.to_excel(out, index=False, engine="openpyxl")
logging.info("Wrote %s", out)
scheduler = BlockingScheduler(timezone="America/New_York")
# Every weekday at 06:00 in the scheduler's timezone.
scheduler.add_job(
generate_report,
CronTrigger(day_of_week="mon-fri", hour=6, minute=0),
id="weekday_morning_report",
max_instances=1, # never overlap a run with itself
coalesce=True, # collapse multiple missed runs into one
misfire_grace_time=300, # still run if up to 5 min late
)
if __name__ == "__main__":
logging.info("Scheduler starting. Ctrl+C to stop.")
try:
scheduler.start() # blocks here forever
except (KeyboardInterrupt, SystemExit):
logging.info("Scheduler stopped.")
Run it and leave it running:
python report_scheduler.py
The process now stays alive and fires generate_report every weekday at 06:00. Stop it with Ctrl+C.
Adding an interval trigger
Use an IntervalTrigger for "every N minutes/hours" instead of a wall-clock time. Add a second job before scheduler.start():
from apscheduler.triggers.interval import IntervalTrigger
scheduler.add_job(
generate_report,
IntervalTrigger(hours=4), # every 4 hours from process start
id="four_hourly_report",
max_instances=1,
coalesce=True,
misfire_grace_time=300,
)
CronTrigger anchors to the clock (06:00 sharp); IntervalTrigger counts from when the scheduler started. Pick cron for fixed report times, interval for steady cadence regardless of time of day.
Handling missed and overlapping runs
These three options are what make an in-process scheduler trustworthy:
misfire_grace_time— if the process was busy or briefly down when a run was due, APScheduler will still fire it if it's no more than this many seconds late. Without it (and with the default ofNonebeing interpreted as "run if at all possible"), set it explicitly so behavior is predictable.coalesce=True— if several runs were missed (the process was down for hours), run the job once when it resumes instead of firing every missed occurrence in a burst.max_instances=1— prevents a slow run from overlapping the next scheduled run of the same job. The second invocation is skipped (and logged) rather than running concurrently against the same output.
Together they mirror the protections you'd otherwise build with flock and careful timing under cron, but they're configured per job in code.
Keeping the process alive
The single biggest difference from cron: if the process dies, no reports run. A crash, a deploy, or a reboot stops the schedule until something restarts the process. Put it under a supervisor.
systemd unit (/etc/systemd/system/excel-reports.service):
[Unit]
Description=APScheduler Excel reports
After=network.target
[Service]
Type=simple
WorkingDirectory=/opt/reporting
ExecStart=/opt/reporting/venv/bin/python /opt/reporting/report_scheduler.py
Restart=always
RestartSec=10
User=svc_reports
[Install]
WantedBy=multi-user.target
sudo systemctl enable --now excel-reports.service
Restart=always brings the process back after a crash or reboot, and coalesce=True ensures the catch-up after a restart is a single run, not a flood. On Windows, run the same script as a service via NSSM or a scheduled task set to run on startup; on a container platform, use a restart policy.
Common pitfalls
| Symptom | Cause | Fix |
|---|---|---|
| Schedule simply stops | The Python process died and nothing restarted it | Run under systemd/NSSM with Restart=always |
| Jobs fire at the wrong hour | No timezone set; APScheduler used the host's | Pass timezone="America/New_York" to the scheduler |
| Two copies of the report run at once | Long run overlapped the next trigger | Set max_instances=1 on the job |
| A burst of runs after downtime | Every missed run fired on resume | Set coalesce=True |
| Script exits immediately, never schedules | Used BackgroundScheduler in a plain script | Use BlockingScheduler, or keep the main thread alive |
| Job added twice on reload | Module imported/reloaded twice in dev servers | Give each job a stable id and add with replace_existing=True |
A note on Blocking vs Background: BlockingScheduler is for standalone scripts — start() blocks and runs the loop. BackgroundScheduler runs in a separate thread and returns immediately, so it suits embedding in a web app — but in a standalone script the program would exit right after start() and nothing would fire.
Frequently asked questions
How is this different from just using cron? Cron launches a new process per run and survives reboots without a resident process. APScheduler runs jobs inside one long-lived Python process — better for in-app scheduling, dynamic jobs, and timezone handling, but it needs a supervisor to stay alive. See Scheduling Python Excel Scripts with Cron for the cron approach.
Why are my jobs running in UTC?
APScheduler defaults to the host timezone, which is often UTC on servers. Pass timezone= to the scheduler (or per trigger) with an IANA name like "Europe/London" so triggers fire at the local time you intend.
My jobs get added twice when I reload — why?
Auto-reloaders and re-imports can run your add_job code more than once. Use a stable id per job and replace_existing=True, so a re-add updates the existing job instead of creating a duplicate.
Can I persist jobs across restarts?
Yes — configure a jobstore (e.g. SQLAlchemy or Redis) so pending jobs survive a process restart. For a fixed in-code schedule like this one, the default in-memory store plus coalesce=True is usually enough.
Should the job run heavy work directly in the scheduler thread?
With BlockingScheduler a long job blocks the next trigger; that's why max_instances=1 matters. For CPU-heavy reports, configure a ThreadPoolExecutor or ProcessPoolExecutor in the scheduler so runs don't starve each other.
Conclusion
APScheduler moves scheduling into your Python process: define a CronTrigger or IntervalTrigger, build the report in the job function, and harden it with misfire_grace_time, coalesce=True, and max_instances=1. Its one liability is that nothing runs if the process dies — so wrap it in systemd or an equivalent supervisor with automatic restart. Choose it when you want cross-platform, timezone-aware, in-app scheduling; choose cron or Task Scheduler when you'd rather the OS own the schedule.
Where to go next
This is the in-process alternative within Scheduling Python Excel Scripts with Cron. For the OS-scheduler route on Windows, see the sibling guide Run a Python Excel Script on Windows Task Scheduler. To enrich the report the scheduler produces, see Building Multi-Sheet Excel Dashboards, and to ship it on each run, Emailing Excel Reports with smtplib.