The Ultimate Website Monitoring Checklist for 2026
Most teams monitor one or two things and call it done. This checklist covers everything worth monitoring — from uptime to cron jobs to your status page itself.
Monitoring is one of those areas where teams either do too little (just uptime checks, nothing else) or too much (every metric from every service, drowning in noise). The goal of this checklist is to help you find the middle ground: comprehensive coverage of the things that actually matter, without instrumentation for its own sake.
This is organized into seven areas. Work through each one and check off what you have in place. The gaps will become obvious.
1. Uptime monitoring
The baseline. If you're not doing this, nothing else on this list matters.
- HTTP uptime checks on all public-facing URLs. Your homepage, your app, your API endpoints, your pricing page. If it's publicly reachable, it should be monitored.
- Check interval of 1–5 minutes. 30-minute checks are almost useless — a 20-minute outage will resolve before your next check fires. Use 1-minute checks on production, 5-minute on staging.
- Verify the response body, not just the status code. A 200 response that says "database connection error" in the body is not an up response. Use keyword matching or content checks where available.
- Monitor from multiple regions. A CDN misconfiguration might return 200 from Virginia but timeout from Frankfurt. Single-region monitoring won't catch it.
- Custom request headers where needed. Some endpoints require auth tokens, API keys, or specific headers. Make sure your monitor can send them.
- Alert on 2+ consecutive failures, not just one. Transient network issues cause false positives. Two consecutive failures is almost always a real outage.
2. SSL certificate monitoring
SSL expiry is embarrassing and entirely avoidable. It causes browser security warnings that break trust and kill conversions. It happens because someone forgot to renew a cert, or because auto-renewal silently failed.
- Monitor expiry on all domains and subdomains you own. Including any custom domains for status pages, app subdomains, and API domains.
- Alert at 30 days, 14 days, and 7 days remaining. 30 days gives you time to investigate auto-renewal failures. 7 days is the "this needs to happen today" alert.
- Verify the cert matches the domain. A cert mismatch (wrong domain, wrong SANs) will break HTTPS just as badly as an expired cert.
- Don't rely solely on Let's Encrypt auto-renewal. Certbot and similar tools fail silently if the renewal challenge fails. Monitor the actual cert, not just the renewal process.
3. Performance and response time monitoring
Being up is not the same as being fast. A page that takes 8 seconds to load is functionally down for a significant portion of users. Response time monitoring catches degradation before it becomes an outage.
- Track response time baselines for critical endpoints. Establish what "normal" looks like during peak and off-peak hours. Anomalies are easier to spot against a baseline.
- Set response time thresholds with alerts. If your API normally responds in 120ms and you see 2000ms, something is wrong — even if the HTTP status is still 200.
- Monitor TTFB (Time to First Byte) separately from full page load. TTFB is server performance. Full page load is frontend performance. They need different responses.
- Track p95 and p99, not just average. A fast average with a terrible p99 means 1% of your users are having a terrible experience — which at scale is a lot of users.
- Alert on sustained degradation, not spikes. A single slow request is noise. Response times consistently 3x above baseline for 5+ minutes is a real problem.
4. Cron job and background task monitoring
This is the most commonly missing piece of a monitoring setup. Cron jobs fail silently. There's no user to report the error. If your nightly billing run stops working, you might not find out until someone notices missing invoices days later.
- Use heartbeat / cron monitoring for every scheduled job. The job sends a ping to a monitoring URL when it completes successfully. If the ping doesn't arrive within the expected window, an alert fires.
- Cover database backups, email sends, invoice generation, and data sync jobs. These are the silent failures that have the biggest business impact.
- Set a grace period that's slightly longer than the job's typical runtime. A job that takes 5 minutes shouldn't alert after 6 minutes — give it a realistic buffer.
- Alert on both missed runs and late runs. A cron that consistently runs 20 minutes late is signaling that something is wrong, even if it eventually completes.
5. Alerting channels and on-call setup
A monitoring system that alerts to an inbox nobody checks is not a monitoring system. Alerts need to reach the right person in the right channel at the right time.
- Have at least two alert channels configured. Email is a minimum. Add Slack, Discord, or a phone notification for anything P1.
- Use different channels for different severities. SSL expiry warnings go to email. Production downtime goes to Slack and triggers a phone call.
- Make sure someone is actually watching the alert channels. This sounds obvious. It's often not done.
- Test your alerts before an incident. Pause a monitor manually and verify the alert fires. You don't want to discover your alert channel is misconfigured during a real outage.
- Configure alert recovery notifications. Knowing when the problem is resolved is as important as knowing when it started.
- Have an on-call rotation if you have a team. Single-point-of-failure on-call leads to burnout and missed alerts.
6. Status page
Your status page is not just for communicating during outages — it's for building the kind of transparency that prevents support tickets during incidents.
- Have a public status page and share the URL with your users. Customers who can check your status page are less likely to flood your support inbox during an incident.
- Make sure your status page reflects real monitor data, not manual updates. A status page that requires someone to manually post "we're investigating" is better than nothing, but an automatically updated one is significantly better.
- Show uptime history (at least 90 days). Monthly uptime bars let users see your reliability track record at a glance. This builds trust proactively, not just during outages.
- Post incident updates during outages. Even "we're aware and investigating" is better than silence. Update every 30 minutes until resolved.
- Use a custom domain for your status page. status.yourdomain.com is more professional than a third-party subdomain, and keeps your brand consistent.
- Monitor your status page itself. If your status page goes down during an outage, users have nowhere to go for information.
7. Monitoring your monitoring
This sounds circular, but it matters: your monitoring system is only useful if it's running correctly. Validate it periodically.
- Verify that alerts are actually firing. Once a quarter, pause a production monitor and confirm you receive the expected alert within the expected timeframe.
- Review your monitor list quarterly. Remove monitors for decommissioned services. Add monitors for new services. Monitors for things that no longer exist produce false alerts and alert fatigue.
- Review alert thresholds after incidents. If you had an incident that monitoring didn't catch (or caught too late), adjust the thresholds. If you're getting too many false positives, raise them.
- Document your monitoring setup. Who gets alerted? What's the escalation path? What does each monitor check? This documentation matters most when the person who set everything up is unavailable.
The quick-start version
If you want to implement this in order of highest return for least effort:
- Set up uptime checks on your main URL and critical API endpoints (30 minutes)
- Add SSL certificate monitoring with 30-day alerts (5 minutes)
- Configure a public status page and share the URL (15 minutes)
- Add heartbeat monitoring for your most critical cron job (10 minutes)
- Configure a second alert channel beyond email (5 minutes)
- Test that your alerts actually fire (10 minutes)
That's roughly 75 minutes to go from zero to a solid monitoring baseline. The more advanced items — multi-region checks, response time baselines, on-call rotations — can be added incrementally.
Set up monitoring in minutes with PingBase
Uptime checks, SSL monitoring, heartbeats, status pages, and multi-channel alerts — free tier available, no credit card required.
Get started →