PagerDuty Services Example: Split High and Low Severity Alerts to Different Escalation Policies

#pagerduty #incidentresponse #devops #cloudops

This post is a follow up example to an earlier post, When Can a Service Not Be a Service? Using PagerDuty in Different Contexts.

In that post, we talked about how the service object in PagerDuty is a place where PagerDuty receives information and uses that information to alert users. Your technical services are the first stop most teams make when implementing PagerDuty, and many teams use services to represent other activities in their ecosystems.

A recent question in the PagerDuty Community Forums brought up another good point about how and when to use services.
Arthas asks:

One service uses two policies, and certain conditions determine which policy to use when creating an event，Is it possible？In some cases, such as P1, P2, P3 events, I want to use ESCALATION POLICY A, but other events use ESCALATION POLICY B (depending on the nature of the event).

This is a good question! What should you do if you have two different actions you want to take on a service alert, based on the severity or context of the alert?

A key piece of information to remember here is that each service object in PagerDuty can only be assigned to one escalation policy. So we know right away that if we want to split the alerts between escalation policies we’ll need two services.

That’s ok! Remember from the earlier post that we can use service objects to represent not just technical services but also workflows, actions, or activities. For Arthas, we’d create two services, one for the high priority alerts and one for the low priority alerts (we’ll call it “January Service” as an example):

Now we have two services we can assign to different escalation policies and different members of the team will be notified.

So how do we determine which alerts should go to the High Priority service and which should go do the Low Priority service? Event Orchestration!

Event Orchestration will let us route events to the two service endpoints based on data found in the body, or payload, of the alerts. This is an example alert message from a fictional system:

{
  "payload": {
    "summary": "nginx is not running on machine prod-datapipe03.example.com",
    "severity": "critical",
    "source": "prod-datapipe03.example.com",
    "component": "nginx",
    "group": "prod-datapipe",
    "class": "service"
  },
  "routing_key": "$PD_ROUTE_KEY",
  "event_action": "trigger",
  "client": "Sample Monitoring Service",
  "client_url": "https://monitoring.service.com"
}

Notice the severity key in the payload. We can use that to determine which service to send the event to. There are four common values for “severity”: critical, error, warn, and info, and the Event Orchestration tool allows you to specify others if the system generating the messages needs it.

To keep my rules simple, I’ve created one rule to route messages with critical to the High Priority service. Event Orchestration then allows me to choose where to send all the rest of the messages that don’t match that severity. I’ve configured the ruleset to send all of those alerts to the Low Priority service:

To make use of this orchestration, I’ll use the “Global Orchestration Key” that is included in its configuration instead of an integration endpoint on the service objects. That way, every event created on the January Service systems will route to one endpoint, and PagerDuty will decide where it goes based on its payload. If I need to, I can add more rules and even more service destinations if that makes sense for my application.

Learn More

To learn more about Event Orchestration, check out the course on PagerDuty University. If you have questions about PagerDuty, join our community forums!

The Ops Community ⚙️

PagerDuty Services Example: Split High and Low Severity Alerts to Different Escalation Policies

Learn More

Top comments (0)