The Ops Community ⚙️

Cover image for Did My Change Contribute to that Incident?
Mandi Walls for PagerDuty Community

Posted on

Did My Change Contribute to that Incident?

Capturing change events in PagerDuty adds context to incidents.

Change is how we improve products and services for our users. We should like change! But change also brings with it uncertainty about the reliability of the service after the change has been introduced. If a service experiences an incident, how can we find which changes were recently made and determine if the change contributed to the incident? This was a question asked by attendees at DevOpsDays Charlotte this year during Open Spaces.

It’s not uncommon for teams to use a number of different monitoring tools across an ecosystem - some tools have better support for certain platforms than others - so we want a centralized location to capture all of that information in a useful way.

For this, PagerDuty provides change events.

What are Change Events

Change events are informational events only - they won’t become an alert or incident on your PagerDuty service. They provide context for any changes that might be coming into the services your team runs and manages with PagerDuty.

Where to Find Change Events

Change events will show up in your PagerDuty account on the main page for a particular service, or you can find all of them under the Incidents->Recent Changes header in the main menu:

PagerDuty web UI. Part of the Recent Changes page is displayed. A list of recent change events are listed in a table, including a summary, the impacted service, the type of event or integration, the date created, and an optional link to the originating service.

Creating Change Events

Change events have a structure similar to normal events that integrations send to PagerDuty, but they have a different API endpoint. You can find a number of pre-built change events integrations on our integrations page by selecting the CI/CD and Change category on the left.

You can also send change events yourself! The following code snippet shows an example in bash. The payload for a change event is encoded in JSON. The routing_key will tell PagerDuty where to send the change event, just like any other event. The API endpoint, included in the curl command, indicates that this event should be routed as a change event, and therefore not create an alert. So you can reuse code you might have for regular events to create change events for your services.

KEY=$PD_ROUTE_KEY

data=$(cat <<EOF
  {
    "routing_key": "$KEY",
    "payload": {
      "summary": "Build Success: Build has Passed.",
      "timestamp": "$DATE",
      "source": "amazing-build-pipeline-thing",
      "custom_details": {
        "build_state": "passed",
        "build_number": "$BUILD_NUMBER",
        "build_developer": "Jill Developer"
      }
    }
  }
EOF
)
echo $data

curl -X POST --header 'Content-Type: application/json' \
--url https://events.pagerduty.com/v2/change/enqueue \
--data "$data"
Enter fullscreen mode Exit fullscreen mode

Setting up a service to receive change events is similar to alerting events as well! In the Integrations tab of the service, you can add an integration, and choose Events API V2 as the integration type. The integration_key in the configuration can be used to send both alerting events and change events, simply by using the separate API endpoints:

The PagerDuty web UI. Part of a service integration page is show, displaying an overview of the Events API v2 integration type, as well as the configuration pieces included in this integration: the name, the key, and target URLs for alerting events and change events.

What are Useful Change Events

Now that you know about this amazing power, what changes should you send to your services? It can be tempting to send all kinds of things, but tracking too many events that don’t actually impact your production environment can be confusing. Focus on events that create change on the systems:

  • Deployment of your application code. How or where you create this event might vary, depending on how you deploy new code.
  • Installation of non-application packages on your system. For long-lived systems in virtual machines or containers that aren’t rebuilt on every deploy, any new library could have an impact on your service. Even security updates might have a negative impact on the way your services run, so look at creating change events for how new packages are installed.
  • Other system changes created by your configuration management system.
  • Scaling events, like a scale-up or scale-down, that might impact the performance of your service.
  • Anything else you can think of! Let us know in the comments!

Change Events and Incidents

Where change events can really save your team time is when your service is experiencing an incident. Change events will appear in PagerDuty on the main page for your service, so you can see what’s been going on recently:

The PagerDuty web UI. Part of an incident page is displayed. The incident appears at the top while a second section shows "Recent Changes" that were recorded to this service. A note in the web UI states "This incident occurred 1 minute after this change".

In this example, a change event reported that the build had passed in some build system but the service is now experiencing an incident. The team can now investigate if the change caused the incident and either proceed with a rollback or fix-forward, or determine that the change didn’t contribute to the incident and troubleshoot other factors. They won’t be guessing what changes have been processed into the service; they’ll know at a glance and can fix or discard right away.

Find Out More

Want to see more change events in action? Check out these sessions from PagerDuty Summit:

Try it out for yourself and let us know what you think. Email us at community-team@pagerduty.com or join our community forums at https://community.pagerduty.com/!

Top comments (0)