Tutorials / Intermediate

Build a Site That Updates Itself Every Day (Cron + Python)

The data-moat pattern: a script that fetches fresh data daily, archives history, and rebuilds your site — while you sleep. This is how tracker sites are built.

DifficultyIntermediate

TimeHalf a day

You'll needClaude or ChatGPT · Python 3 · A host with cron jobs (most shared hosting has it)

You'll buildAn automated pipeline: daily data fetch → dated snapshot archive → change detection → site rebuild. After setup, the site grows more valuable every day with zero work.

There are two kinds of websites: ones that decay (content goes stale from the day you publish) and ones that compound (every day adds data nobody can retroactively collect). Price trackers, availability monitors, statistics archives, change logs — these compound. The architecture behind all of them is the same four-step pipeline, and AI writes every step.

The architecture

[daily, automatic]
fetch.py     → pulls today's data (API or scrape)
snapshot     → saves data/snapshots/2026-06-04.json  (immutable archive)
diff         → compares vs yesterday, logs changes to changes.json
build.py     → regenerates the static site from all data

[once] cron  → runs the above every morning

The archive folder is the asset. Today's data is on the source's website; 180 days of dated snapshots exists only on your server. That history — 'X raised prices twice this year', 'Y was out of stock 40 days' — is content competitors cannot recreate.

Step 1 — Build the fetcher

Write fetch.py in Python (requests + stdlib): it calls [API endpoint / scrapes page — see the scraping tutorial], extracts [fields], and writes data/snapshots/YYYY-MM-DD.json named with today's date. If today's file exists, overwrite it (safe to re-run). If the fetch fails or returns empty/malformed data, do NOT write a file — print an error and exit with a non-zero status code instead.

Worth knowingThat last sentence is the line between a dataset and garbage. A failed fetch that silently writes an empty snapshot poisons your archive and your diffs. Fail loudly, write nothing.

Step 2 — Build the differ

Write diff.py: load today's snapshot and the most recent previous one from data/snapshots/. Compare item by item. For each change (value changed, item added, item removed) append an entry to data/changes.json with date, item, old value, new value, and direction. De-duplicate if run twice in one day. Print a summary of changes found.

The changes file becomes your site's best page: a permanent changelog. 'Every price change, logged daily' earns links and return visits in a way static comparison content never does.

Step 3 — Rebuild the site from data

Your build script (the programmatic-site tutorial pattern) renders pages from the latest snapshot plus the changes log: current-state tables, per-item pages with history sections, and the changelog. The site is a pure function of the data folder — delete dist/, rebuild, identical site.

Step 4 — Schedule it

On shared hosting, find Cron Jobs in the control panel and add one entry:

0 6 * * * cd /home/youruser/yoursite && python3 fetch.py && python3 diff.py && python3 build.py >> cron.log 2>&1

Translation: every day at 6:00, fetch, diff, rebuild, and append all output to cron.log. The && chaining means a failed fetch stops the pipeline — yesterday's good site stays up rather than a broken rebuild going live. Check cron.log after the first two mornings; after that, check when something looks stale.

Step 5 — Let time do the work

Week 1, your changelog is empty and unimpressive. Month 3, you have the only public record of every change in your niche. Month 12, you're the citation. The compounding is the strategy: pick data where changes matter to someone, start the cron, and be patient. The hardest part of this pattern isn't technical — it's starting six months before you wish you had.

Keep going

Need somewhere to put it live? See where to host AI-built sites. Compare tool costs on the pricing tracker (or stick to the free options), then pick your next build.