Build a Complete Monitoring Stack for Under $50/Month
Enterprise observability stacks cost thousands per month and take weeks to configure. But for an indie hacker or small team, you can assemble a complete monitoring stack — uptime, logs, errors, and performance — for under $50/month, most of it free. Here's exactly what to use and how to connect it all.
Observability is the practice of understanding what your system is doing from the outside — through the signals it emits. There are four main pillars: uptime monitoring (is it up?), logging (what happened?), error tracking (what broke?), and performance monitoring (how fast?). Each covers different failure modes, and together they form a complete picture.
Large engineering organizations spend five figures per month on tools like Datadog, Splunk, and New Relic. But most of what those tools do is available on free tiers of focused tools — if you know which ones to combine.
This guide assembles a full stack for a typical indie SaaS running on a modern edge-first or serverless architecture (Cloudflare Workers, Vercel, Railway, Fly.io, or a small VPS). Adjust to your infrastructure as needed.
Total monthly cost
Free tiers are sufficient for most indie products under ~10k MAU. Costs increase as you scale.
Layer 1: Uptime monitoring — PingBase ($9/month)
Uptime monitoring is the foundation of any observability stack. It answers the most basic question: is my service reachable? Without it, you're blind to the most visible failure mode — complete unavailability.
What PingBase covers
- HTTP monitors — checks your URLs every minute, verifies status codes and response content
- SSL certificate monitoring — alerts you 30 days before a certificate expires
- Heartbeat / cron monitoring — detects when scheduled jobs stop running
- TCP / port monitoring — checks that ports are accepting connections
- Public status page — gives users a place to check your status and subscribe to updates
- Multi-region checks — eliminates false positives from single-location network issues
What to set up on day one
- HTTP monitor on your homepage
- HTTP monitor on your API health endpoint (
/healthor/api/health) - HTTP monitor on your login page
- SSL monitor on your main domain
- Heartbeat monitor on your cron job (backup, invoice generation, etc.)
- Public status page linked from your app footer
Alerts go to email (immediate) and Slack (team visibility). The free tier covers 5 monitors — enough to start. Pro ($9/month) gives you unlimited monitors, custom domain for your status page, and 1-minute check frequency.
For a full breakdown of what to monitor, see The Ultimate Website Monitoring Checklist for 2026.
Layer 2: Logs — Grafana Cloud (free tier)
Logs answer "what happened?" Uptime monitoring tells you that your API returned a 500 at 2:13am — but logs tell you which request it was, what parameters it had, what line of code it hit, and what the stack trace was. You need both.
What Grafana Cloud offers on the free tier
- 14 days of log retention
- 50 GB of log ingestion per month
- Grafana Loki for log querying (LogQL syntax)
- Grafana dashboards for visualization
- Grafana Prometheus for metrics
- 3 users included
50 GB/month is generous for a small product. Even with verbose logging, most indie SaaS apps produce well under 1 GB of log data per month at early stages.
How to ship logs to Grafana Loki
The exact integration depends on your runtime:
- Node.js / Express / Fastify. Use the
winston-lokitransport orpino-loki. These push log entries directly to the Loki HTTP API. - Cloudflare Workers. Workers don't have persistent processes, so you can't run a log agent. Instead, use the Cloudflare Logpush feature to forward Worker logs to a Loki-compatible endpoint, or use the Grafana Cloud integration in the Cloudflare dashboard.
- Vercel / serverless. Vercel has a native log drain integration that can forward to an HTTP endpoint. Point it at a Grafana Alloy instance running on a small VPS, or use a log proxy service.
- VPS / Docker. Run Grafana Alloy (formerly Grafana Agent) as a sidecar or system service. It tails log files and ships to Loki automatically.
What to log
- Every HTTP request: method, path, status code, duration, user ID (if authenticated)
- Every error: message, stack trace, request context
- Every background job: start time, end time, outcome, any errors
- Authentication events: login, logout, failed login attempts
- Payment events: charge attempt, success, failure, refund
Don't log sensitive data (passwords, full credit card numbers, PII) — but log enough to reconstruct what a user did and what the system returned.
How logs and uptime monitoring complement each other
When PingBase alerts you that your API returned a 500, your first step is to open Grafana Loki and query for error-level logs in the minute surrounding the alert timestamp. Within 30 seconds, you'll have the stack trace, the affected endpoint, and the likely cause. Without logs, you're debugging blind.
Layer 3: Error tracking — Sentry (free tier)
Logs capture what happened in your system. Error tracking captures what broke in your code — with a user-facing view, a stack trace with source map resolution, breadcrumbs showing what the user did before the error, and a count of how many users were affected.
What Sentry offers on the free tier
- 5,000 errors per month
- 1 user (free) — fine for solo founders
- JavaScript, Python, Go, Rust, and most other languages via SDKs
- Source map support for minified JavaScript
- Performance monitoring for up to 10 transactions/second (sampling)
- 30 days of error retention
5,000 errors per month is plenty for early-stage. If you're generating more than that in errors, you have a code quality problem that's more urgent than monitoring costs.
Setting up Sentry
Sentry integration is typically a 15-minute setup:
- Create a Sentry project and get your DSN
- Install the Sentry SDK:
npm install @sentry/nodefor Node.js,@sentry/reactfor React frontend - Initialize Sentry at the entry point of your app with your DSN
- For React: wrap your root component in
Sentry.ErrorBoundary - Upload source maps during your build process for human-readable stack traces
- Set up Sentry alerts to post to your Slack channel for new error types
Sentry vs logs: which to check first?
Sentry is better for JavaScript exceptions and user-facing errors — it gives you the full React component tree, the user actions that led up to the error (breadcrumbs), and whether the same error is affecting 1 user or 1,000. Logs are better for server-side request-level debugging. In practice, you'll usually start with the PingBase alert, open Sentry to check if there's a correlated error spike, then open Loki if you need more context.
Layer 4: Synthetic end-to-end checks — Checkly (free tier)
Uptime monitoring checks that your endpoints respond. Synthetic monitoring checks that your user flows work — that a user can sign up, log in, create a resource, and complete a checkout from start to finish. These are different problems.
What Checkly offers on the free tier
- 3 browser checks (Playwright-based)
- 5 API checks
- Checks run every 10 minutes
- Alert notifications via email and Slack
Three browser checks is enough to cover your most critical user flows: signup, login, and your core product action (create first resource, start trial, etc.).
Setting up Checkly
Checkly uses Playwright syntax for browser checks. A basic login check looks like:
const { test, expect } = require('@playwright/test');
test('User can log in', async ({ page }) => {
await page.goto('https://app.yourproduct.com/login');
await page.fill('[data-testid="email"]', process.env.CHECK_EMAIL);
await page.fill('[data-testid="password"]', process.env.CHECK_PASSWORD);
await page.click('[data-testid="submit"]');
await expect(page).toHaveURL(/dashboard/);
await expect(page.locator('h1')).toContainText('Dashboard');
});
Checkly runs this check from multiple locations every 10 minutes, alerts you when it fails, and shows you a screenshot and video recording of the failure. This catches a class of issues that HTTP monitoring can't: when your server returns 200 but the page content is wrong, the JavaScript errors out on render, or the login form submits but the redirect fails.
What flows to check
- Signup flow. Can a new user register, verify email, and land on the dashboard?
- Login flow. Can an existing user log in?
- Core action. Can a logged-in user perform the primary action your product exists for?
These three checks cover the majority of "the product is broken" scenarios.
Connecting the stack: the incident workflow
Having four tools is only useful if they work together. Here's the incident workflow this stack enables:
- PingBase detects failure. Your API health endpoint returns 500 at 2:13am. PingBase fires after 2 consecutive failures — alert lands in Slack at 2:14am.
-
Check Sentry. Open Sentry and look at the error spike around 2:13am. You see 47 errors of the same type:
DatabaseConnectionError: connection pool exhausted. - Check Grafana Loki. Query for error logs around 2:13am. See the exact query that triggered the pool exhaustion — a poorly-indexed query started running at 2:10am and holding connections.
- Fix and deploy. Kill the offending query, add the index, deploy.
- PingBase sends recovery alert. 2:31am — service is back. 18 minutes of downtime captured with exact timestamps.
- Post incident update to status page. Update the PingBase status page with a timeline. Users who subscribed to status notifications get an email: "The incident affecting API performance has been resolved."
From first alert to resolution, every step has supporting data. You never debug blind.
Optional: metrics — Grafana Prometheus (included in Grafana Cloud)
Grafana Cloud's free tier includes Prometheus for metrics storage and Grafana dashboards for visualization. If you're running a Node.js server, you can expose a /metrics endpoint using prom-client and scrape it with Grafana Alloy.
Useful metrics to track:
- Request rate (requests/second per endpoint)
- Error rate (% of requests returning 4xx or 5xx)
- P95 and P99 response time by endpoint
- Database connection pool utilization
- Background job queue depth and processing rate
For serverless or edge runtimes that don't support persistent Prometheus scraping, you can push metrics directly to the Grafana Cloud Prometheus remote write endpoint on each request. The overhead is small.
When to upgrade beyond free tiers
The free tier stack handles most indie products comfortably to $5k–$10k MRR. Here's when you'll need to upgrade:
| Tool | When to upgrade | Paid tier starts at |
|---|---|---|
| PingBase | When you need more than 5 monitors or custom domain for status page | $9/month |
| Grafana Cloud | When you need >50GB logs/month, >14 days retention, or >3 users | ~$8/month (pay as you go) |
| Sentry | When you need >5k errors/month, multiple team members, or longer retention | $26/month (Team) |
| Checkly | When you need >3 browser checks or 5-minute check intervals | $20/month |
At full paid tiers, this stack costs roughly $63/month — still dramatically less than a comparable Datadog setup at similar scale, which would run $150–$400+/month for the same coverage.
The 30-minute setup plan
Here's a realistic order of operations to get the full stack live in one session:
- PingBase (5 min): Sign up, add your homepage + API health monitors, configure email alerts, create a status page. Link the status page from your app footer.
- Sentry (10 min): Create project, install SDK, add to your app entry point. Deploy. Trigger a test error to confirm it arrives.
- Grafana Cloud (10 min): Create free account, configure a log shipping integration for your runtime, verify logs are arriving in Loki. Set up one dashboard showing error rate and request count.
- Checkly (5 min): Create free account, write a login check using the Playwright editor, enable Slack notifications. Verify the check runs successfully.
That's it. In 30 minutes you go from zero observability to a production-grade stack that will catch the vast majority of real-world issues before users escalate them to you.
Start the stack with PingBase — it's free
PingBase is the uptime monitoring and status page layer of this stack. Free for up to 5 monitors. Takes 5 minutes to set up. No credit card required.
Get started free →Related
The Complete Guide to Uptime Monitoring in 2026
Everything you need to know about every type of monitor, alerting, and status pages.
The Ultimate Website Monitoring Checklist for 2026
A complete checklist of everything worth monitoring before and after launch.
What Is Uptime Monitoring? A Beginner's Guide
How uptime monitoring works and why you need it as soon as you have users.