← Blog
Developer Guides 8 min read

How to Monitor a Next.js Application

Next.js applications have more failure points than a simple static site. API routes, server-side rendering, incremental static regeneration, edge middleware, and third-party dependencies can each fail independently. Here's how to set up monitoring that catches every layer.

A Next.js app is not a single URL. It's a combination of static assets, dynamically rendered pages, API endpoints, and possibly edge functions. Each layer can fail while others remain healthy. A status check on your homepage alone won't tell you that your API routes are down or that ISR pages are serving stale 500 errors.


What can break independently

API routes (/api/*). These run as serverless functions on Vercel or as Node processes on self-hosted deployments. A database connection failure, missing environment variable, or dependency error can take all API routes down while your static pages continue serving from CDN cache — making the app appear healthy at a glance.

Server-side rendered pages. SSR pages call your data layer on every request. A slow or failing database query causes those pages to time out or error. The CDN has nothing cached to fall back on — users get a 500.

Incremental Static Regeneration (ISR). ISR pages serve a cached version while regenerating in the background. If the regeneration fails (data source unavailable), the cache goes stale. Users don't see errors immediately — they see old data. Content checks catch this; status checks don't.

Edge middleware. Middleware runs before every request. A bug in middleware can redirect all users to a 404, block authentication, or infinite-loop — while your underlying pages are completely healthy.

Health check endpoint. If you have a /api/health or /api/status route that checks your database, cache, and external dependencies, monitoring it tells you more than checking any user-facing URL.


Building a health check API route

If your Next.js app doesn't have a health check endpoint, add one. It should verify the components your app depends on:

// app/api/health/route.ts
import { NextResponse } from 'next/server'
import { db } from '@/lib/db'

export async function GET() {
  const checks: Record<string, boolean> = {}

  // Database check
  try {
    await db.execute('SELECT 1')
    checks.database = true
  } catch {
    checks.database = false
  }

  // Cache/Redis check (if applicable)
  // try { await redis.ping(); checks.cache = true } catch { checks.cache = false }

  const healthy = Object.values(checks).every(Boolean)
  return NextResponse.json(
    { status: healthy ? 'ok' : 'degraded', checks },
    { status: healthy ? 200 : 503 }
  )
}

// Prevent this route from being cached
export const dynamic = 'force-dynamic'

Monitor this endpoint with a status check expecting 200. When any dependency fails, the endpoint returns 503 and your alert fires.


What to monitor: full coverage table

Monitor URL pattern Check type
Health check/api/healthStatus 200, JSON contains "status":"ok"
Homepage (SSG/ISR)/Status 200, content check for nav/brand
Critical SSR page/dashboardStatus 200, slow threshold 2000ms
Key API route/api/postsStatus 200, JSON array in response
SSL certificateyourdomain.comExpiry alert 30d + 7d

Vercel vs self-hosted: different failure modes

On Vercel: serverless functions cold-start, scale to zero, and have execution time limits. A function exceeding the 10-second execution limit (Hobby) or 60-second limit (Pro) returns a 504 that PingBase's response time alert will catch before your users notice patterns.

Self-hosted (VPS / Docker): your Node process can crash entirely. All routes — static and dynamic — return connection refused. Your uptime check catches this within one poll interval. Set your failure threshold to 1 on a self-hosted deployment; you don't want 2 confirmation polls before alerting.

On Cloudflare Pages with Workers: Workers have a CPU time limit (10ms on free, 30s on paid) and can fail if external fetches time out. Monitor your API routes with a slow threshold set lower than the Worker limit so you catch degradation before it becomes a hard error.


Content checks for ISR pages

ISR pages are subtle. When regeneration fails, the page still returns 200 — it's serving the last successfully cached version. If that version is hours or days old, users see stale data without any error.

Use a content check targeting something that changes frequently in your data — a "Last updated" timestamp, a record count, or a date field. If the content check detects the string is unexpectedly old or missing, your alert fires. This is the only reliable way to detect ISR staleness externally.


Response time thresholds for Next.js

Next.js renders are fast when cached, slow when computing. Set your thresholds by route type:


Setup checklist

  1. Add /api/health route that checks DB and critical dependencies, returns 503 on failure
  2. Add PingBase monitor for /api/health — status 200, content check for "status":"ok"
  3. Add monitor for homepage — content check for nav brand text
  4. Add monitor for your most critical SSR page — slow threshold 2000ms
  5. Add SSL certificate monitor for your custom domain
  6. Configure Slack or webhook alerts so the team sees issues immediately

Five monitors cover the full Next.js stack. All five fit in PingBase's free tier.

Monitor your entire Next.js stack

API routes, SSR pages, ISR staleness, SSL. Free for up to 5 monitors.

Start free →

Related