Incident Management

An incident represents a period of downtime or a problem detected by one or more monitors. Incidents are created automatically when monitors detect issues and resolve automatically when all affected monitors recover. You cannot create incidents manually.

Incidents differ from individual check failures. A single failed check does not create an incident. Instead, the system waits for confirmation that a problem is real before opening an incident and notifying you.

How Incidents Are Created

Incident creation depends on the type of check.

HTTP Checks

An incident is created after 2 consecutive failed checks. This avoids false alarms from transient network issues or brief server hiccups. If the first check fails but the second succeeds, no incident is created.

SSL Certificate Checks

An incident is created when the certificate status changes to a new severity level:

Expiring in 21 days - Early warning
Expiring in 7 days - Urgent warning
Expiring in 1 day - Critical warning
Expired - Certificate has expired
Invalid - Certificate validation failed (e.g., wrong hostname, broken chain)

Each escalation to a more severe level updates the incident’s severity. For example, an incident that starts at “expiring in 21 days” will escalate to “expiring in 7 days” as the certificate gets closer to expiration.

Domain Expiration Checks

Domain checks follow the same pattern as SSL checks with their own severity levels:

Expiring in 30 days - Early warning
Expiring in 7 days - Urgent warning
Expiring in 1 day - Critical warning
Expired - Domain registration has expired
Invalid - Domain lookup failed

Multi-Monitor Incidents

When multiple monitors in the same team fail around the same time, they are grouped into a single incident rather than creating separate incidents for each one. This reduces alert noise when an outage affects several of your services at once.

How Grouping Works

When the first monitor reaches the alert threshold, a new incident is created with a 5-minute grouping window. Any other monitors in the same team that fail within that 5-minute window are added to the same incident. Only one notification is sent for the group, not one per monitor.

Grouping applies within the same check type: HTTP failures group with HTTP failures, SSL issues group with SSL issues, and domain issues group with domain issues. An HTTP failure and an SSL issue will create separate incidents.

Notifications

Down notification: Sent once when the incident is first created (i.e., when the first monitor hits the alert threshold)
Recovery notification: Sent only when all monitors in the incident have recovered.

Incident Lifecycle

Every incident has one of two statuses:

Ongoing - At least one monitor in the incident is still failing
Resolved - All monitors have recovered

Severity Escalation

For SSL and domain incidents, the severity can escalate over time. If an incident started because a certificate was expiring in 21 days, and it still hasn’t been renewed by the 7-day mark, the incident’s severity is updated to reflect the more urgent status. This helps you see the current risk level at a glance.

Resolution

An incident resolves automatically when every monitor in the incident reports a successful check. At that point:

The incident status changes to Resolved
The resolution timestamp and response time are recorded
A recovery notification is sent

Viewing Incidents

From a Monitor

Open any monitor and select the Incidents tab. This shows all incidents that affected that monitor, sorted by most recent first. The table displays:

Started - When the incident began
Status - Ongoing or Resolved
Type - HTTP, SSL, or Domain
Monitors - Number of monitors affected
Duration - How long the incident lasted (or “Ongoing”)
Error - The initial error message

Click any row to open the incident detail page.

Incident Detail Page

The incident detail page shows:

Header - Monitor name (or “X Monitors Affected” for multi-monitor incidents), check type, status badge, start time, resolution time, and duration
Affected Monitors - A table of all monitors in the incident, their current status (Down or Recovered), and when they recovered. Each monitor name links to its monitor page.
Acknowledgements - Who has acknowledged the incident and when (see below)
Notes - Free-text field for documenting the incident (see below)
Timeline - Chronological list of all events during the incident (see below)

Incident Timeline

The timeline shows every event that occurred during the incident, sorted with the most recent event first:

Check Failed - A check returned an error, with the error message or HTTP status code
Check Succeeded - A check passed, with the response time
Status Changed - A monitor’s status transitioned (e.g., “up” to “down”)
Notification Sent - A notification was delivered, showing the channel (email, Slack, Google Chat) and recipient

For multi-monitor incidents, each event shows which monitor it belongs to.

Acknowledging Incidents

Acknowledging an incident lets your team know that someone is looking into the problem. It does not change the incident’s status or stop notifications.

To acknowledge an incident:

Open the incident detail page
Click the Acknowledge button in the Acknowledgements section
Optionally add a comment explaining what you’re doing
Choose whether to Notify Team (enabled by default) - this sends a notification to the team owner and members
Click Submit

Each person can only acknowledge an incident once.

All acknowledgements are listed in a table showing who acknowledged, when, and any comments.

Incident Notes

Use the Notes field to document root cause analysis, actions taken, or lessons learned.

To add or update notes:

Open the incident detail page
Type in the Notes text area
Click Save

Notifications

Notifications are sent at two points in an incident’s lifecycle:

When the incident is created - A down alert is sent to the team owner and members with alert subscriptions
When the incident is resolved - A recovery alert is sent

Notifications are delivered through whichever channels are configured for the team.

Acknowledgement notifications (when someone acknowledges an incident) are sent separately to the team.