Uptime Monitoring Best Practices for Indie Hackers
Most indie hackers set up uptime monitoring wrong — or not at all. The "not at all" group finds out their product is down from a frustrated user in their DMs. Here's how to actually do it right.
Start with what matters, not everything
The most common mistake is monitoring everything at once. You add your marketing site, your app, your API, your admin panel, your staging environment, and then wonder why you're getting paged about things that don't matter.
Start with a short list: what are the URLs that, if unavailable, mean a customer is actively blocked from using your product?
For most SaaS products that's two or three things:
- The main application (
app.yourproduct.com) - The API if external developers call it (
api.yourproduct.com) - Authentication endpoint if it's separate (
auth.yourproduct.com)
Your marketing site going down is a problem, but it's not the same class of problem as your app being completely inaccessible. Treat them differently.
Use a dedicated endpoint for monitoring
Monitoring your homepage URL is a start, but it's not ideal. Your homepage might be cached, served from a CDN, or otherwise disconnected from your actual application server.
The right approach is to create a dedicated health check endpoint — typically /health or /healthz — that:
- Returns HTTP 200 when everything is working
- Returns HTTP 500 (or similar) when something is wrong
- Actually exercises your dependencies (database, cache, critical queues)
app.get('/health', async (req, res) => {
try {
await db.query('SELECT 1');
res.json({ status: 'ok' });
} catch (err) {
res.status(503).json({ status: 'error', detail: err.message });
}
});
Now your monitor is actually checking that your database is reachable, not just that your CDN is serving a cached HTML file.
Understand what "down" actually means
When a monitoring tool says your site is "down," it means one of:
- Connection refused: The server isn't accepting connections at all
- Timeout: The server accepted the connection but never responded within the timeout window
- Non-2xx/3xx status: The server responded, but with an error (404, 500, 503, etc.)
Each of these has different causes and different severity. A timeout often means your server is overloaded but running. A 500 means your app is running but throwing an unhandled error. A connection refused means the process is completely dead.
When you're debugging, look at the last recorded status code in your check history — it tells you a lot about where to start looking.
Set your alert email to something you'll actually see
This sounds obvious, but it breaks constantly. Developers route monitoring alerts to their personal email, then change jobs or email providers, and the alerts quietly go nowhere.
Use a shared inbox. Set your alert email to ops@yourcompany.com or alerts@yourcompany.com — a distribution list that goes to everyone on the team. This way a single person's inbox problems don't create a blind spot.
Keep it separate from your main inbox. If you route monitoring alerts to the same address where you get all your email, you'll start ignoring them during busy periods. A dedicated filter or label that makes alerts visually distinct helps.
PingBase lets you set a separate alert email from your login email — your account email can be your personal address while alerts go to the team inbox.
Don't ignore your first false alarm — investigate it
When you get an alert and your site looks fine, the temptation is to dismiss it and move on. Resist this.
False positives from uptime monitoring usually have one of a few causes:
- Transient network issues — brief blips between the checker and your server. These happen occasionally and are mostly harmless, but a sudden increase in frequency means something changed.
- Your server is slow — the check timed out not because the server was down, but because it was overloaded. The monitor calls it "down" because customers also experienced it as unusably slow.
- A firewall rule changed — something started blocking the monitoring service's IP range. Your real users aren't affected, but the checker can't reach you.
- A redirect broke — a recent deploy changed a redirect and the final destination is now an error page.
One false alarm is noise. Two in a week is a pattern worth investigating. Five in a month means something is structurally wrong.
Check interval matters more than you think
Free monitoring tools often check every 5 or 10 minutes. That means if your site goes down at 2:00 PM, you might not find out until 2:10 PM — and your customers have been hitting error pages for 10 minutes already.
For a consumer app or anything with SLA commitments, 1-minute checks are the practical minimum. At that frequency, the worst case is roughly 60 seconds between the outage starting and you getting paged.
For a B2B app with business-hours users, 5-minute checks might be acceptable. But if you're ever going to put uptime percentages in a contract or on a pricing page, get to at least 1-minute checks first — otherwise your "99.9% uptime" claim is based on coarse-grained data that misses short incidents entirely.
PingBase's Pro plan ($9/mo) does 1-minute checks across up to 10 monitors. For most indie hackers, that's the right tier.
Set up your status page before you need it
A status page does two things: it gives customers a place to check during incidents, and it signals professionalism to prospective customers evaluating your product.
The second benefit is often overlooked. Having a public status page at status.yourproduct.com — even when everything is green — tells potential customers that you take reliability seriously enough to be transparent about it. In a world where most indie SaaS products don't have one, it's a small differentiator that costs nothing.
Set it up when you add your first monitor, not during your first incident.
The checklist
None of this is complicated or expensive. The barrier is almost always "I'll do it later" — and "later" usually means after the first embarrassing outage.
Do it now. Your future self at 3am will thank you.