How to Monitor a Next.js Application
Next.js applications have more failure points than a simple static site. API routes, server-side rendering, incremental static regeneration, edge middleware, and third-party dependencies can each fail independently. Here's how to set up monitoring that catches every layer.
A Next.js app is not a single URL. It's a combination of static assets, dynamically rendered pages, API endpoints, and possibly edge functions. Each layer can fail while others remain healthy. A status check on your homepage alone won't tell you that your API routes are down or that ISR pages are serving stale 500 errors.
What can break independently
API routes (/api/*). These run as serverless functions on Vercel or as Node processes on self-hosted deployments. A database connection failure, missing environment variable, or dependency error can take all API routes down while your static pages continue serving from CDN cache — making the app appear healthy at a glance.
Server-side rendered pages. SSR pages call your data layer on every request. A slow or failing database query causes those pages to time out or error. The CDN has nothing cached to fall back on — users get a 500.
Incremental Static Regeneration (ISR). ISR pages serve a cached version while regenerating in the background. If the regeneration fails (data source unavailable), the cache goes stale. Users don't see errors immediately — they see old data. Content checks catch this; status checks don't.
Edge middleware. Middleware runs before every request. A bug in middleware can redirect all users to a 404, block authentication, or infinite-loop — while your underlying pages are completely healthy.
Health check endpoint. If you have a /api/health or /api/status route that checks your database, cache, and external dependencies, monitoring it tells you more than checking any user-facing URL.
Building a health check API route
If your Next.js app doesn't have a health check endpoint, add one. It should verify the components your app depends on:
// app/api/health/route.ts
import { NextResponse } from 'next/server'
import { db } from '@/lib/db'
export async function GET() {
const checks: Record<string, boolean> = {}
// Database check
try {
await db.execute('SELECT 1')
checks.database = true
} catch {
checks.database = false
}
// Cache/Redis check (if applicable)
// try { await redis.ping(); checks.cache = true } catch { checks.cache = false }
const healthy = Object.values(checks).every(Boolean)
return NextResponse.json(
{ status: healthy ? 'ok' : 'degraded', checks },
{ status: healthy ? 200 : 503 }
)
}
// Prevent this route from being cached
export const dynamic = 'force-dynamic'
Monitor this endpoint with a status check expecting 200. When any dependency fails, the endpoint returns 503 and your alert fires.
What to monitor: full coverage table
| Monitor | URL pattern | Check type |
|---|---|---|
| Health check | /api/health | Status 200, JSON contains "status":"ok" |
| Homepage (SSG/ISR) | / | Status 200, content check for nav/brand |
| Critical SSR page | /dashboard | Status 200, slow threshold 2000ms |
| Key API route | /api/posts | Status 200, JSON array in response |
| SSL certificate | yourdomain.com | Expiry alert 30d + 7d |
Vercel vs self-hosted: different failure modes
On Vercel: serverless functions cold-start, scale to zero, and have execution time limits. A function exceeding the 10-second execution limit (Hobby) or 60-second limit (Pro) returns a 504 that PingBase's response time alert will catch before your users notice patterns.
Self-hosted (VPS / Docker): your Node process can crash entirely. All routes — static and dynamic — return connection refused. Your uptime check catches this within one poll interval. Set your failure threshold to 1 on a self-hosted deployment; you don't want 2 confirmation polls before alerting.
On Cloudflare Pages with Workers: Workers have a CPU time limit (10ms on free, 30s on paid) and can fail if external fetches time out. Monitor your API routes with a slow threshold set lower than the Worker limit so you catch degradation before it becomes a hard error.
Content checks for ISR pages
ISR pages are subtle. When regeneration fails, the page still returns 200 — it's serving the last successfully cached version. If that version is hours or days old, users see stale data without any error.
Use a content check targeting something that changes frequently in your data — a "Last updated" timestamp, a record count, or a date field. If the content check detects the string is unexpectedly old or missing, your alert fires. This is the only reliable way to detect ISR staleness externally.
Response time thresholds for Next.js
Next.js renders are fast when cached, slow when computing. Set your thresholds by route type:
- Static pages (SSG): 500ms — anything slower means CDN issues or massive page size
- ISR pages (cache hit): 300ms — these should be near-instant from edge cache
- SSR pages: 1500ms — depends on data layer; calibrate to your P95 baseline
- API routes: 1000ms — tighter than SSR because these are pure function calls
- Health check: 500ms — if this is slow, your infrastructure has a problem
Setup checklist
- Add
/api/healthroute that checks DB and critical dependencies, returns 503 on failure - Add PingBase monitor for
/api/health— status 200, content check for"status":"ok" - Add monitor for homepage — content check for nav brand text
- Add monitor for your most critical SSR page — slow threshold 2000ms
- Add SSL certificate monitor for your custom domain
- Configure Slack or webhook alerts so the team sees issues immediately
Five monitors cover the full Next.js stack. All five fit in PingBase's free tier.
Monitor your entire Next.js stack
API routes, SSR pages, ISR staleness, SSL. Free for up to 5 monitors.
Start free →