A self-hosted prerender service for JavaScript-heavy single-page applications (SPAs). It intercepts requests from SEO bots and crawlers, serves pre-rendered HTML from a disk cache, and uses headless Chromium (via Puppeteer) to render pages on demand.
- Nginx inspects the incoming
User-Agent. If it matches a known bot (Googlebot, Bingbot, Twitterbot, etc.), the request is proxied to this service on port8830. - The service checks a disk cache directory for a pre-rendered
.htmlfile keyed by the MD5 hash of the canonical URL. - On a cache miss (or after the TTL expires), the service launches a headless Chromium tab via Puppeteer, waits for the page to fully hydrate, and saves the minified HTML to disk.
- The rendered HTML is returned to the bot with appropriate headers.
- A scheduled job crawls the site's XML sitemap and pre-warms the cache for all URLs.
- Bot request events are persisted to PostgreSQL and exposed via an analytics API.
flowchart TD
A([Incoming HTTP Request]) --> B[Nginx]
B --> C{Bot User-Agent\nor static asset?}
C -- "Regular user / static asset" --> D([Serve SPA normally])
C -- "Known bot\ne.g. Googlebot" --> E[Proxy to Prerender\nservice :8830\nwith X-Prerender: 1]
E --> F[Prerender Service\nFastify + Node.js]
F --> G{URL marked\nas Gone?}
G -- "Yes (.gone file)" --> H([410 Gone])
G -- No --> I{Redirect\nmapping exists?}
I -- "Yes (.json file)" --> J([301 Redirect])
I -- No --> K{Disk cache\nhit & fresh?}
K -- "Yes (within TTL)" --> L([Serve cached HTML\ncache: HIT])
K -- "No (miss or expired)" --> M[Launch Puppeteer\nHeadless Chromium]
M --> N[Navigate to SITE_URL\nwait for networkidle0\n& meta title hydration]
N --> O[Extract & minify HTML\nhtml-minifier-terser]
O --> P[Save to disk\nmd5 URL .html]
P --> Q([Serve fresh HTML\ncache: MISS/EXPIRED])
F --> R[(PostgreSQL\nAnalytics DB)]
G --> R
I --> R
K --> R
Q --> R
subgraph Scheduler [Scheduled Cache Warmer]
S[toad-scheduler\nperiodic job] --> T[Fetch XML Sitemap]
T --> U[Render all URLs\nvia Puppeteer]
U --> P
end
style D fill:#4a90d9,color:#fff
style H fill:#e05c4b,color:#fff
style J fill:#f5a623,color:#fff
style L fill:#27ae60,color:#fff
style Q fill:#27ae60,color:#fff
style R fill:#8e44ad,color:#fff
style Scheduler fill:#f9f9f9,stroke:#aaa
Clone the repository:
git clone https://github.com/haku-d/prerender.git
cd prerenderCreate a .env file:
PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser
SITE_URL=https://your-site.com
STATIC_SITE_DIR=/data/prerender
DATABASE_URL=postgresql://prerender:secret@postgres:5432/prerender
POSTGRES_PASSWORD=secretThe docker-compose.yml includes three services:
| Service | Description | Port | Required |
|---|---|---|---|
prerender |
The prerender service | 8830 (internal) |
yes |
postgres |
PostgreSQL 16 | internal | no (analytics only) |
prerender-ui |
Admin dashboard (haku-d/prerender-ui) | 6996 → 4000 |
no |
Start the services:
docker compose up -dDB migrations run automatically on startup. Remove postgres and prerender-ui from docker-compose.yml if you don't need analytics or the dashboard.
Add the configuration from nginx.conf to your Nginx setup to route bot traffic to the prerender service:
- A
mapblock identifies bot User-Agents via$prerender_ua. - URLs matching static asset extensions (
.js,.css,.png, etc.) are never prerendered. - Bot requests are rewritten to
/prerender-proxyand proxied tohttp://prerender_backend(port8830). - The
X-Prerender: 1header prevents re-entrant prerender loops.
| Layer | Technology |
|---|---|
| Runtime | Node.js 20, TypeScript |
| HTTP framework | Fastify 5 |
| Headless browser | Puppeteer Core + Chromium (Alpine) |
| Database | PostgreSQL 16 |
| DB migrations | dbmate |
| Scheduler | toad-scheduler |
| HTML minification | html-minifier-terser |
| Monitoring | New Relic |
| Package manager | pnpm |
| Variable | Required | Default | Description |
|---|---|---|---|
PUPPETEER_EXECUTABLE_PATH |
yes | — | Path to the Chromium binary (e.g. /usr/bin/chromium-browser) |
SITE_URL |
yes | — | Canonical site URL used to normalise cache keys (e.g. https://example.com) |
STATIC_SITE_DIR |
yes | — | Absolute path to the directory where pre-rendered HTML files are stored |
DATABASE_URL |
no | — | PostgreSQL connection string (e.g. postgresql://user:pass@host:5432/dbname). Analytics are disabled when omitted. |
CACHE_TTL_SECONDS |
no | 43200 |
Seconds before a cached file is considered stale (default 12 hours) |
EXECUTE_JOB_ON_START |
no | false |
Set to true to trigger a full sitemap crawl immediately on startup |
PORT |
no | 8830 |
HTTP port the service listens on |
Create a .env file in the project root for local development:
PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser
SITE_URL=https://example.com
STATIC_SITE_DIR=/tmp/prerender-site
DATABASE_URL=postgresql://prerender:secret@localhost:5432/prerender
CACHE_TTL_SECONDS=43200
EXECUTE_JOB_ON_START=false- Node.js 20+
- pnpm (
npm install -g pnpm) - Chromium installed locally (or set
PUPPETEER_EXECUTABLE_PATHto a Docker-managed binary) - PostgreSQL 16 (optional — only needed for analytics)
- dbmate for running migrations
pnpm installpnpm run db:migrateStarts TypeScript in watch mode alongside the Fastify server with auto-reload:
pnpm run devCompiles TypeScript then starts the server:
pnpm run startpnpm run testCoverage reports are written to coverage/.
All requests are unauthenticated. The service is intended to be deployed on a private network behind Nginx.
Main prerender endpoint. Accepts the full site URL as the path (e.g. GET /https://example.com/products/business-cards).
- Returns cached HTML if available and not expired.
- Triggers an on-demand Puppeteer render on cache miss, saves to disk, and returns the result.
- Returns
410 Goneif the URL has been marked as gone. - Returns
301 Redirectif a redirect mapping exists for the URL.
Force-renders a URL, overwrites the cache entry, and returns the new HTML. Useful for cache invalidation after a page update.
GET /hit/https://example.com/products/business-cards
Marks a URL as permanently gone. Subsequent requests to /* for this URL will return 410.
GET /gone/https://example.com/old-page
Removes the gone marker for a URL.
DELETE /gone/https://example.com/old-page
Stores a 301 redirect mapping from one URL to another. Subsequent requests to /* for <url1> will redirect to <url2>.
GET /redirect/https://example.com/old-path/https://example.com/new-path
Returns bot request statistics from the PostgreSQL database.
Query parameters (both optional):
| Param | Format | Description |
|---|---|---|
from |
YYYY-MM-DD |
Start of date range (inclusive) |
to |
YYYY-MM-DD |
End of date range (inclusive) |
Example response:
{
"total": 1240,
"uniquePages": 87,
"byBot": [{ "botName": "googlebot", "count": 640 }],
"byCacheStatus": [{ "cacheStatus": "hit", "count": 980 }]
}- Cache files are stored in
STATIC_SITE_DIRas<md5(fullUrl)>.html. - Gone markers are stored as empty files under
STATIC_SITE_DIR/gone/. - Redirect mappings are stored as JSON files under
STATIC_SITE_DIR/redirects/. - Cache is considered stale after
CACHE_TTL_SECONDSseconds (default 12 hours). - Stale cache entries are re-rendered on the next request and updated in the background.
The bot_requests table records every bot hit:
| Column | Type | Description |
|---|---|---|
id |
integer | Primary key |
requested_at |
timestamptz | Request timestamp |
url |
text | Full canonical URL |
path |
text | URL path only |
bot_name |
text | Detected bot name |
user_agent |
text | Raw User-Agent header |
cache_status |
text | hit, miss, stale, gone, redirect |
http_status |
integer | HTTP status code returned |
render_duration_ms |
integer | Puppeteer render time in ms (null for cache hits) |
Migrations are managed by dbmate and located in db/migrations/.