← Blog
Operations 9 min read

Incident Communication Best Practices for SaaS

The technical work of fixing an outage is one thing. Communicating with users while it's happening is another. Done badly, even a short incident becomes a lasting trust problem. Done well, it becomes a demonstration of your reliability culture.

Here's the uncomfortable truth about incidents: users often judge you more by how you communicate than by how quickly you fix the problem. A 30-minute outage with regular updates and a clear resolution message leaves a better impression than a 10-minute outage that users only find out about after the fact.

Incident communication is a skill. These are the practices that separate teams that handle it well from teams that make a bad situation worse.


The first rule: acknowledge fast, before you know anything

The most common incident communication mistake is waiting until you know what's wrong before posting anything. This is backwards. The purpose of the first update isn't to explain the problem — it's to show users that you're aware and working on it.

You should post a status update within 5 minutes of declaring an incident, even if all you can say is:

Example first update (2 minutes after incident starts)

"We are investigating reports of elevated error rates affecting the API. Engineers are investigating. Next update in 10 minutes."

That update does three things: it confirms you know about the problem, it tells users something is being done, and it sets an expectation for the next update. Users who see this are much less likely to open a support ticket.

What you should not do: wait 20 minutes before posting anything because you wanted to include the root cause. By then, frustrated users have already emailed support, posted on social media, and formed the impression that you don't know what's happening.


Update frequency: commit to a cadence and keep it

When you post your first update, you're making an implicit promise: "I will keep you informed." The fastest way to lose user trust during an incident is to make that promise and then go quiet.

Recommended update cadence:

The "even if you have no new information" part is important. A brief "We continue to investigate. Engineers are working on the issue. Next update in 15 minutes." tells users that the problem hasn't been forgotten. Silence does the opposite.


What to include in each update

Every status update should answer four questions, adapted to what you know at that moment:

  1. What is affected? Be specific. "Users attempting to log in" is better than "some users." "The API is returning 503 errors" is better than "there are issues."
  2. What is not affected? If you know certain features are unaffected, say so. "Existing sessions are unaffected. Users who are already logged in can continue to use the service normally." This immediately reduces the blast radius in users' minds.
  3. What are you doing about it? Even vague is better than nothing. "Engineers are investigating the root cause." "We have identified the issue and are deploying a fix."
  4. When is the next update? Always commit to the next update time. This prevents the anxiety of "are they still working on it?"

Good incident update — all four elements

"We have identified the root cause: a database configuration change deployed at 14:30 UTC is causing connection pool exhaustion. Users attempting to create new records are experiencing errors. Read operations are unaffected. We are rolling back the configuration change now. We expect full recovery within 10 minutes. Next update at 15:20 UTC."

Poor incident update — vague and unhelpful

"We are aware of issues and working to resolve them. We apologize for any inconvenience."


Tone: direct and factual, not apologetic and hedged

Incident updates are not the place for corporate hedging language. Phrases like "some users may be experiencing" and "intermittent issues with certain functionality" read as evasion. Users know something is wrong — they're reading your status page because they're affected. Speak clearly.

Things to avoid:

The right tone is direct, factual, and calm. You're a professional handling a technical problem. Acknowledge the impact, explain what's happening in plain language, and communicate what you're doing about it.


The resolution update: close the loop properly

Many teams do the hard work of communicating during an incident and then drop the ball on the resolution update. They mark the incident as resolved without explanation, or they post "Issue resolved" with no context.

The resolution update is the last impression users have of how you handled the incident. Make it count:

Good resolution update

"Resolved. The API is fully operational as of 15:18 UTC. Total incident duration: 48 minutes. Root cause: a database configuration change introduced at 14:30 UTC caused connection pool exhaustion under normal load. We rolled back the configuration at 15:15 UTC and have confirmed full recovery. We are reviewing our deployment process to prevent similar configuration changes from reaching production without additional verification. We'll post a full postmortem within 24 hours."

This tells users: what happened, how long it lasted, what caused it, what fixed it, and what you're doing to prevent recurrence. That final element — "what you're doing to prevent recurrence" — is important. It shows that you learned from the incident and are taking it seriously.


Using PingBase's incident timeline

PingBase's status page includes an incident timeline — a chronological log of all updates posted during an incident. When you post updates from your PingBase dashboard, they appear on the public status page in real time with timestamps.

How to use it effectively during an incident:


Preparing before incidents happen

The best time to build your incident communication process is before you have an incident. A few things to set up in advance:


The long-term compounding effect

Every incident is an opportunity. Not a good one — you'd rather not have incidents — but an opportunity nonetheless. A team that communicates well during an outage comes out of it with users who trust them more, not less. "They kept us informed the whole time and explained exactly what happened" is a statement that people make about services they stay with.

Teams that go silent during incidents, or post vague non-updates, lose trust that's hard to rebuild. The technical quality of your service matters. So does the way you treat users when things go wrong.

Set up your incident timeline

PingBase status pages include incident management with a public timeline. Free to get started, no credit card required.

Get started free →

Related