← Blog
Operations 9 min read

The 7 SaaS Metrics You Should Be Monitoring Right Now

MRR and churn are important — but they're lagging indicators. By the time they move, the damage is done. These 7 metrics tell you what's happening in real time, before it shows up in your revenue data.


1. API error rate

Your API error rate is one of the fastest-moving signals in your stack. A sudden spike in 5xx errors is almost always correlated with a deploy, a database issue, or an upstream dependency failing — and it's visible within seconds of the incident starting.

What to watch: the rate of 5xx responses as a percentage of total requests. A baseline of 0.1% is normal for most APIs. If you see it hit 1% or above, something is wrong.

How to monitor it: set up an alert that fires if your error rate exceeds a threshold over a rolling 5-minute window. Avoid alerting on absolute counts — a single error at low traffic is different from 1% at high traffic.


2. Response time (p95 and p99)

Averages hide tail latency problems. Your p50 response time can look fine while 1 in 20 requests is taking 10 seconds. Track p95 (the 95th percentile) and p99 as your primary latency metrics.

p95 above 2 seconds is a user experience problem. p99 above 5 seconds means a meaningful fraction of your users are having a bad time. Both should trigger investigation, even if your uptime check says everything is up.


3. Payment conversion rate

A broken checkout flow doesn't always generate errors. Sometimes Stripe loads fine, the form submits, but there's a backend bug that silently fails to create the subscription. Or the confirmation email doesn't send. Or the session expires mid-flow.

Monitor the ratio of initiated checkouts to completed subscriptions. Any significant drop is a revenue alert — not just a technical one. A synthetic check that simulates a test payment end-to-end is the most reliable way to catch regressions in this flow.


4. Signup completion rate

Users abandon signups for two reasons: they changed their mind, or something broke. You can't fix the first one. You can fix the second.

Track the ratio of signup page visits to activated accounts over a rolling window. A sudden drop that's not correlated with a traffic source change usually means your signup flow has a bug — email not delivering, verification link broken, onboarding step erroring out.


5. SSL certificate expiry

SSL expiry is a fully preventable catastrophe. When a certificate expires, your site becomes inaccessible to most users — browsers show a big red warning, most users bounce immediately, and your organic search rankings drop. The fix takes minutes but the window before you catch it can be days.

Set up automated SSL monitoring with alerts starting 30 days before expiry, then again at 14 days and 7 days. If you're using Let's Encrypt with auto-renewal, monitoring tells you when auto-renewal silently fails — which happens more often than it should.


6. Background job queue depth

A backed-up job queue is a slow-motion incident. It often starts with a single slow worker, a database timeout on a batch operation, or a downstream API rate limit. If nobody's watching, the queue grows until your workers are hours behind — and users start noticing that their exports haven't arrived, their emails haven't sent, or their data isn't updating.

Monitor queue depth with an alert when it exceeds a threshold that represents "more than N minutes of backlog." The right threshold depends on your expected throughput, but for most SaaS products, a queue depth above 1,000 jobs warrants investigation.


7. Database connection pool utilization

Connection pool exhaustion is one of the sneakiest causes of SaaS outages. When your pool is at 100%, new requests queue up — and your API starts returning errors or timing out. The database itself is often fine. Your application is the bottleneck.

Monitor the ratio of active connections to maximum pool size. Alert at 80% utilization. By the time you hit 95%, you're likely already seeing errors. Catching it at 80% gives you time to investigate and scale before users are impacted.


Putting it together

You don't need all 7 monitoring sources live on day one. The priority order for a typical SaaS:

  1. Uptime and SSL monitoring (prevent the embarrassing, fully-avoidable outages)
  2. API error rate and response time (catch technical regressions fast)
  3. Payment conversion rate (protect revenue directly)
  4. Signup completion rate (protect top-of-funnel)
  5. Job queue depth and database connections (as your infrastructure grows)

The goal isn't a perfect observability stack — it's knowing about problems before your users email you about them. Even two or three of these metrics, monitored consistently, will catch the majority of incidents before they become crises.

Continue reading

Operations

The ROI of Uptime Monitoring: What Downtime Really Costs

Tools

SSL Certificate Monitoring: Don't Let Expiry Kill Your Site