How to Set Up an On-Call Rotation That Doesn't Suck
Most on-call setups fail the same way: too many noisy alerts, no clear handoff process, and an engineer who's exhausted by Thursday. Here's how to build a rotation that's actually sustainable.
Why most on-call rotations burn people out
On-call burnout rarely happens because of real incidents. It happens because of bad configuration: alerts that fire for things that don't matter, no clear escalation path, and engineers who have no idea what they're expected to do when something breaks.
Before you build the rotation schedule, you need three things in place:
- Alerts that only fire for things that actually require a human response
- A runbook (even a basic one) so the on-call person isn't starting from zero at 3am
- An escalation path — who to call when the on-call engineer can't fix it alone
Without these, the rotation schedule doesn't matter. You're just spreading misery evenly.
Choosing a rotation structure
The most common rotation structures are weekly and follow-the-sun. Each works for different team sizes and geographies.
Weekly rotation
One engineer carries a full week, then hands off. Simple to schedule, but requires at least 3–4 people to avoid frequent rotations. Works best when incidents are rare.
Follow-the-sun
Engineers cover business hours in their timezone. Eliminates overnight coverage for any single person. Requires distributed teams — doesn't work if everyone is in the same timezone.
Weekday/weekend split
Primary on-call covers weekdays; a separate engineer takes weekends. Reduces weekend disruptions. Works well when weekend volume is lower than weekday volume.
For most early-stage SaaS teams, a weekly rotation with a primary and a secondary (the escalation target) is the right starting point. It's simple and doesn't require complex tooling.
Configuring alerts for on-call
Alert fatigue is the on-call killer. If your on-call engineer gets 12 alerts a night, half of which are flapping monitors or non-critical warnings, they stop paying attention — and miss the one real incident buried in the noise.
The rule of thumb: if an alert doesn't require action, it shouldn't page.
Before adding anything to your on-call alert channel:
- Ask: "If I got this alert at 2am, what would I do?" If the answer is "go back to sleep," it shouldn't page.
- Use confirmation windows — require 2–3 consecutive failures before alerting. Eliminates transient blips.
- Set appropriate thresholds. Not every slow response needs to wake someone up.
- Route informational alerts to a separate Slack channel, not the on-call channel.
The handoff process
A rotation without a handoff is just on-call theater. The handoff is where context gets transferred — and without it, the incoming on-call engineer starts cold.
A minimal but effective handoff includes:
- Ongoing issues: Anything that's not fully resolved, even if it's been stable for a few days
- Recent changes: Deploys, config changes, or dependency updates in the last week that could cause issues
- Known flaky monitors: Alerts that have been firing spuriously so the incoming engineer doesn't waste time investigating
- What to watch: Anything that's been trending in the wrong direction
Keep the handoff async — a Slack message or short doc is fine. It doesn't need to be a meeting.
Compensating fairly
On-call is work. If engineers are expected to respond to incidents outside business hours, that needs to be reflected in compensation, time off, or both.
Common approaches:
- On-call stipend: A flat payment per week on-call, regardless of incidents. Compensates availability, not just incidents.
- Incident compensation: Pay per paged incident or per hour worked outside business hours.
- Comp time: Time off in lieu of pay. Works better when your team has flexibility in how they use their time.
- No incidents this week? No on-call next week: For very small teams, informal agreements can substitute for formal policies early on.
What doesn't work: pretending on-call is just "part of the job" with no acknowledgment. That approach leads to quiet resentment and eventual attrition.
The on-call setup checklist
Before going on-call
At each handoff
On-call doesn't have to be dreaded. With the right alert hygiene, clear processes, and fair compensation, it becomes a manageable part of operating a reliable product — rather than a source of burnout that drives good engineers away.