Incident Management
How incidents are created, managed, and resolved.
An incident represents a period of downtime or a problem detected by one or more monitors. Incidents are created automatically when monitors detect issues and resolve automatically when all affected monitors recover. You cannot create incidents manually.
Incidents differ from individual check failures. A single failed check does not create an incident. Instead, the system waits for confirmation that a problem is real before opening an incident and notifying you.
How Incidents Are Created
Incident creation depends on the type of check.
HTTP Checks
An incident is created after 2 consecutive failed checks. This avoids false alarms from transient network issues or brief server hiccups. If the first check fails but the second succeeds, no incident is created.
SSL Certificate Checks
An incident is created when the certificate status changes to a new severity level:
- Expiring in 21 days - Early warning
- Expiring in 7 days - Urgent warning
- Expiring in 1 day - Critical warning
- Expired - Certificate has expired
- Invalid - Certificate validation failed (e.g., wrong hostname, broken chain)
Each escalation to a more severe level updates the incident’s severity. For example, an incident that starts at “expiring in 21 days” will escalate to “expiring in 7 days” as the certificate gets closer to expiration.
Domain Expiration Checks
Domain checks follow the same pattern as SSL checks with their own severity levels:
- Expiring in 30 days - Early warning
- Expiring in 7 days - Urgent warning
- Expiring in 1 day - Critical warning
- Expired - Domain registration has expired
- Invalid - Domain lookup failed
Multi-Monitor Incidents
When multiple monitors in the same team fail around the same time, they are grouped into a single incident rather than creating separate incidents for each one. This reduces alert noise when an outage affects several of your services at once.
How Grouping Works
When the first monitor reaches the alert threshold, a new incident is created with a 5-minute grouping window. Any other monitors in the same team that fail within that 5-minute window are added to the same incident. Only one notification is sent for the group, not one per monitor.
Grouping applies within the same check type: HTTP failures group with HTTP failures, SSL issues group with SSL issues, and domain issues group with domain issues. An HTTP failure and an SSL issue will create separate incidents.
Notifications
- Down notification: Sent once when the incident is first created (i.e., when the first monitor hits the alert threshold)
- Recovery notification: Sent only when all monitors in the incident have recovered.
Incident Lifecycle
Every incident has one of two statuses:
- Ongoing - At least one monitor in the incident is still failing
- Resolved - All monitors have recovered
Severity Escalation
For SSL and domain incidents, the severity can escalate over time. If an incident started because a certificate was expiring in 21 days, and it still hasn’t been renewed by the 7-day mark, the incident’s severity is updated to reflect the more urgent status. This helps you see the current risk level at a glance.
Resolution
An incident resolves automatically when every monitor in the incident reports a successful check. At that point:
- The incident status changes to Resolved
- The resolution timestamp and response time are recorded
- A recovery notification is sent
Viewing Incidents
From a Monitor
Open any monitor and select the Incidents tab. This shows all incidents that affected that monitor, sorted by most recent first. The table displays:
- Started - When the incident began
- Status - Ongoing or Resolved
- Type - HTTP, SSL, or Domain
- Monitors - Number of monitors affected
- Duration - How long the incident lasted (or “Ongoing”)
- Error - The initial error message
Click any row to open the incident detail page.
Incident Detail Page
The incident detail page shows:
- Header - Monitor name (or “X Monitors Affected” for multi-monitor incidents), check type, status badge, start time, resolution time, and duration
- Affected Monitors - A table of all monitors in the incident, their current status (Down or Recovered), and when they recovered. Each monitor name links to its monitor page.
- Acknowledgements - Who has acknowledged the incident and when (see below)
- Notes - Free-text field for documenting the incident (see below)
- Timeline - Chronological list of all events during the incident (see below)
Incident Timeline
The timeline shows every event that occurred during the incident, sorted with the most recent event first:
- Check Failed - A check returned an error, with the error message or HTTP status code
- Check Succeeded - A check passed, with the response time
- Status Changed - A monitor’s status transitioned (e.g., “up” to “down”)
- Notification Sent - A notification was delivered, showing the channel (email, Slack, Google Chat) and recipient
For multi-monitor incidents, each event shows which monitor it belongs to.
Acknowledging Incidents
Acknowledging an incident lets your team know that someone is looking into the problem. It does not change the incident’s status or stop notifications.
To acknowledge an incident:
- Open the incident detail page
- Click the Acknowledge button in the Acknowledgements section
- Optionally add a comment explaining what you’re doing
- Choose whether to Notify Team (enabled by default) - this sends a notification to the team owner and members
- Click Submit
Each person can only acknowledge an incident once.
All acknowledgements are listed in a table showing who acknowledged, when, and any comments.
Incident Notes
Use the Notes field to document root cause analysis, actions taken, or lessons learned.
To add or update notes:
- Open the incident detail page
- Type in the Notes text area
- Click Save
Notifications
Notifications are sent at two points in an incident’s lifecycle:
- When the incident is created - A down alert is sent to the team owner and members with alert subscriptions
- When the incident is resolved - A recovery alert is sent
Notifications are delivered through whichever channels are configured for the team.
Acknowledgement notifications (when someone acknowledges an incident) are sent separately to the team.