← Blog
DevOps 12 min read

Monitor Your Microservices: A Complete Guide

Microservices distribute failure across many services instead of concentrating it in one. That's a feature — but it makes monitoring harder. Here's how to get visibility across your entire service mesh.

Monitoring a monolith is straightforward: one application, one health check, one deploy. When it's down, everything is down. When it's up, everything is up.

Microservices break this model. You have 5, 10, or 50 services, each with its own health, its own deployment cadence, and its own failure modes. A degraded recommendation service might not affect checkout. A broken notification service might silently fail to send emails while everything else runs normally. An unhealthy worker pool might be processing queue messages three times slower than expected without triggering any alert.

Effective microservices monitoring requires thinking about each layer: individual service health, inter-service dependencies, asynchronous workers, and the customer-visible aggregate.


Layer 1: Per-service health checks

Every service should expose a health endpoint. The standard is GET /health or GET /healthz (the Kubernetes convention). This endpoint should return 200 when the service is healthy and a non-200 status when it isn't.

A good health endpoint does more than return 200:

# Example health response

{
  "status": "ok",
  "version": "1.4.2",
  "uptime": 86400,
  "dependencies": {
    "database": "ok",
    "redis": "ok",
    "external_api": "degraded"
  }
}

This structure gives you at a glance: is the service healthy, what version is running, and which dependencies are contributing to any degradation. The dependencies block is particularly valuable — it lets you distinguish "this service is broken" from "this service is healthy but one of its dependencies is degraded."

In PingBase, set up one HTTP monitor per service pointing at its health endpoint. Use content validation to assert that "status":"ok" appears in the response body — this catches the case where a service returns 200 but reports itself as unhealthy.


Layer 2: Response time thresholds per service

In a microservices architecture, latency cascades. If Service A calls Service B which calls Service C, and Service C is slow, the slowness propagates up the call chain and multiplies. A service that's responding in 800ms when it normally takes 100ms may be masking a deeper dependency problem.

Set response time thresholds based on each service's baseline, not arbitrary round numbers:

Service type Typical baseline Suggested alert threshold
Health check endpoint<50ms200ms
Read-only data service50–200ms500ms
Write / mutation service100–400ms1000ms
Orchestration / gateway200–500ms2000ms
ML inference service200ms–2s5000ms

Calibrate these against your actual p95 response times, not the table above. The goal is to alert on meaningful deviation from normal, not to hit an arbitrary target.


Layer 3: Async workers and background services

Microservices architectures often have more async workers than synchronous services. Message consumers, event processors, scheduled aggregation jobs, data pipeline workers — these have no HTTP endpoint to monitor. They either run and process, or they silently stop.

Heartbeat monitoring is the right pattern. Each worker pings a unique URL after each successful processing cycle. If the ping stops arriving, the monitor alerts.

Common async components to monitor with heartbeats:

# Example: Kafka consumer with heartbeat

async function processMessages() {
  while (true) {
    const messages = await consumer.poll({ timeout: 1000 });

    for (const message of messages) {
      await processMessage(message);
    }

    await consumer.commitOffsets();

    // Ping after each successful batch
    if (messages.length > 0) {
      await fetch(process.env.PINGBASE_HEARTBEAT_URL).catch(() => {});
    }
  }
}

// Also ping on a timer even when queue is empty
setInterval(() => {
  fetch(process.env.PINGBASE_HEARTBEAT_URL).catch(() => {});
}, 60_000); // Every minute

Layer 4: The API gateway or BFF

Most microservices architectures have an API gateway or backend-for-frontend (BFF) layer that aggregates calls to downstream services. This is the layer your users actually interact with.

Monitor the gateway differently from internal services:


Layer 5: The public status page

Your internal monitoring gives your engineering team visibility into service health. Your status page gives your users visibility. In a microservices architecture, the mapping between internal services and user-visible components isn't always 1:1.

Design your status page around user-visible functionality, not internal service names:

Internal services Status page component
auth-service, session-service, user-serviceAuthentication
api-gateway, routing-serviceAPI
notification-service, email-worker, template-serviceEmail & Notifications
billing-service, payment-processor-adapterBilling
cdn, asset-service, frontend-appDashboard

Users don't know what auth-service is. They know whether they can log in. Structure your status page around their experience, and map your monitors to the appropriate component.


Monitoring during deployments

In a microservices architecture, deployments are continuous. Services deploy independently, often multiple times per day. Each deployment is a potential incident source.

Two patterns worth implementing:

Pause monitors during rolling deployments. When a service is deploying, individual instances restart sequentially. A monitor checking during this window might catch an instance mid-restart and fire a false alert. Use PingBase's GitHub Action to pause the relevant monitor during deployment and resume it after health checks pass.

# .github/workflows/deploy.yml

- name: Pause monitor during deploy
  uses: pingbase/pause-monitor@v1
  with:
    api-key: ${{ secrets.PINGBASE_API_KEY }}
    monitor-id: ${{ vars.PAYMENT_SERVICE_MONITOR_ID }}

- name: Deploy payment-service
  run: kubectl rollout restart deployment/payment-service

- name: Wait for rollout
  run: kubectl rollout status deployment/payment-service

- name: Resume monitor
  uses: pingbase/resume-monitor@v1
  with:
    api-key: ${{ secrets.PINGBASE_API_KEY }}
    monitor-id: ${{ vars.PAYMENT_SERVICE_MONITOR_ID }}

Post-deploy verification. After a deployment completes, run a targeted check against the health endpoint before resuming monitoring. If the check fails, roll back rather than resuming monitoring and waiting for an alert.


Organizing monitors at scale

When you have 20+ services, flat monitor lists become unmanageable. Use PingBase's monitor groups to organize by team, environment, or service tier:

For teams using the API to manage monitors programmatically, PingBase's REST API supports bulk creation and tagging — useful when spinning up a new service follows a repeatable pattern.


What PingBase gives you for microservices

The free tier covers 5 monitors — enough to cover the critical path (gateway, auth, billing) while evaluating. Pro at $9/month covers up to 10 monitors. Business at $29/month is unlimited.

Start monitoring your services

HTTP, heartbeat, DNS, and SSL monitoring in one tool. API and CLI for programmatic setup. Free to start.

Get started free →

Related