Capturing change events in PagerDuty adds context to incidents.
Change is how we improve products and services for our users. We should like change! But change also brings with it uncertainty about the reliability of the service after the change has been introduced. If a service experiences an incident, how can we find which changes were recently made and determine if the change contributed to the incident? This was a question asked by attendees at DevOpsDays Charlotte this year during Open Spaces.
It’s not uncommon for teams to use a number of different monitoring tools across an ecosystem - some tools have better support for certain platforms than others - so we want a centralized location to capture all of that information in a useful way.
For this, PagerDuty provides change events.
Change events are informational events only - they won’t become an alert or incident on your PagerDuty service. They provide context for any changes that might be coming into the services your team runs and manages with PagerDuty.
Change events will show up in your PagerDuty account on the main page for a particular service, or you can find all of them under the Incidents->Recent Changes header in the main menu:
Change events have a structure similar to normal events that integrations send to PagerDuty, but they have a different API endpoint. You can find a number of pre-built change events integrations on our integrations page by selecting the CI/CD and Change category on the left.
You can also send change events yourself! The following code snippet shows an example in
bash. The payload for a change event is encoded in JSON. The
routing_key will tell PagerDuty where to send the change event, just like any other event. The API endpoint, included in the
curl command, indicates that this event should be routed as a change event, and therefore not create an alert. So you can reuse code you might have for regular events to create change events for your services.
"summary": "Build Success: Build has Passed.",
"build_developer": "Jill Developer"
curl -X POST --header 'Content-Type: application/json' \
--url https://events.pagerduty.com/v2/change/enqueue \
Setting up a service to receive change events is similar to alerting events as well! In the Integrations tab of the service, you can add an integration, and choose Events API V2 as the integration type. The integration_key in the configuration can be used to send both alerting events and change events, simply by using the separate API endpoints:
Now that you know about this amazing power, what changes should you send to your services? It can be tempting to send all kinds of things, but tracking too many events that don’t actually impact your production environment can be confusing. Focus on events that create change on the systems:
- Deployment of your application code. How or where you create this event might vary, depending on how you deploy new code.
- Installation of non-application packages on your system. For long-lived systems in virtual machines or containers that aren’t rebuilt on every deploy, any new library could have an impact on your service. Even security updates might have a negative impact on the way your services run, so look at creating change events for how new packages are installed.
- Other system changes created by your configuration management system.
- Scaling events, like a scale-up or scale-down, that might impact the performance of your service.
- Anything else you can think of! Let us know in the comments!
Where change events can really save your team time is when your service is experiencing an incident. Change events will appear in PagerDuty on the main page for your service, so you can see what’s been going on recently:
In this example, a change event reported that the build had passed in some build system but the service is now experiencing an incident. The team can now investigate if the change caused the incident and either proceed with a rollback or fix-forward, or determine that the change didn’t contribute to the incident and troubleshoot other factors. They won’t be guessing what changes have been processed into the service; they’ll know at a glance and can fix or discard right away.
Want to see more change events in action? Check out these sessions from PagerDuty Summit:
- Speed Up Distribution by Monitoring Change Events with JFrog and PagerDuty
- PD Summit21: Buildkite: Increase Your Reliability with PagerDuty Change Events in CI