How to Set Up an On-Call Rotation That Doesn't Suck

Most on-call setups fail the same way: too many noisy alerts, no clear handoff process, and an engineer who's exhausted by Thursday. Here's how to build a rotation that's actually sustainable.

Why most on-call rotations burn people out

On-call burnout rarely happens because of real incidents. It happens because of bad configuration: alerts that fire for things that don't matter, no clear escalation path, and engineers who have no idea what they're expected to do when something breaks.

Before you build the rotation schedule, you need three things in place:

Alerts that only fire for things that actually require a human response
A runbook (even a basic one) so the on-call person isn't starting from zero at 3am
An escalation path — who to call when the on-call engineer can't fix it alone

Without these, the rotation schedule doesn't matter. You're just spreading misery evenly.

Choosing a rotation structure

The most common rotation structures are weekly and follow-the-sun. Each works for different team sizes and geographies.

Weekly rotation

One engineer carries a full week, then hands off. Simple to schedule, but requires at least 3–4 people to avoid frequent rotations. Works best when incidents are rare.

Follow-the-sun

Engineers cover business hours in their timezone. Eliminates overnight coverage for any single person. Requires distributed teams — doesn't work if everyone is in the same timezone.

Weekday/weekend split

Primary on-call covers weekdays; a separate engineer takes weekends. Reduces weekend disruptions. Works well when weekend volume is lower than weekday volume.

For most early-stage SaaS teams, a weekly rotation with a primary and a secondary (the escalation target) is the right starting point. It's simple and doesn't require complex tooling.

Configuring alerts for on-call

Alert fatigue is the on-call killer. If your on-call engineer gets 12 alerts a night, half of which are flapping monitors or non-critical warnings, they stop paying attention — and miss the one real incident buried in the noise.

The rule of thumb: if an alert doesn't require action, it shouldn't page.

Before adding anything to your on-call alert channel:

Ask: "If I got this alert at 2am, what would I do?" If the answer is "go back to sleep," it shouldn't page.
Use confirmation windows — require 2–3 consecutive failures before alerting. Eliminates transient blips.
Set appropriate thresholds. Not every slow response needs to wake someone up.
Route informational alerts to a separate Slack channel, not the on-call channel.

The handoff process

A rotation without a handoff is just on-call theater. The handoff is where context gets transferred — and without it, the incoming on-call engineer starts cold.

A minimal but effective handoff includes:

Ongoing issues: Anything that's not fully resolved, even if it's been stable for a few days
Recent changes: Deploys, config changes, or dependency updates in the last week that could cause issues
Known flaky monitors: Alerts that have been firing spuriously so the incoming engineer doesn't waste time investigating
What to watch: Anything that's been trending in the wrong direction

Keep the handoff async — a Slack message or short doc is fine. It doesn't need to be a meeting.

Compensating fairly

On-call is work. If engineers are expected to respond to incidents outside business hours, that needs to be reflected in compensation, time off, or both.

Common approaches:

On-call stipend: A flat payment per week on-call, regardless of incidents. Compensates availability, not just incidents.
Incident compensation: Pay per paged incident or per hour worked outside business hours.
Comp time: Time off in lieu of pay. Works better when your team has flexibility in how they use their time.
No incidents this week? No on-call next week: For very small teams, informal agreements can substitute for formal policies early on.

What doesn't work: pretending on-call is just "part of the job" with no acknowledgment. That approach leads to quiet resentment and eventual attrition.

The on-call setup checklist

Before going on-call

Alert channels configured — only actionable alerts page Confirmation windows set to prevent false positives Runbooks written for top 3 most likely failure modes Escalation path defined (primary → secondary → manager) Status page configured and ready to update

At each handoff

Outgoing engineer sends handoff notes Any open or ongoing incidents documented Known flaky monitors flagged

On-call doesn't have to be dreaded. With the right alert hygiene, clear processes, and fair compensation, it becomes a manageable part of operating a reliable product — rather than a source of burnout that drives good engineers away.