These are the resources listed in my talk "Tackling Alert Fatigue with SLOs, Automation, and Machine Learning".
Have any other tips? Let us know in the comments!
- Understanding Alert Fatigue and How to Prevent It https://www.pagerduty.com/resources/learn/alert-fatigue/
- 8 Ways to Reduce Alert Fatigue https://www.pagerduty.com/blog/reduce-alert-fatigue/
- What’s the Difference Between SLAs, SLOs, and SLIs? https://www.pagerduty.com/resources/learn/what-is-slo-sla-sli/
- What is AIOps? https://www.pagerduty.com/resources/learn/what-is-aiops/
- Automated Remediation https://autoremediation.pagerduty.com/
- Google SRE Workbook https://sre.google/workbook/implementing-slos/
- Nobl9 https://www.nobl9.com/
Top comments (2)
One tip I’d add is building in regular “alert reviews” with the team bloodmoney 2—basically pruning rules that generate noise and fine-tuning thresholds based on real incidents.
Great roundup of resources, Mandi 🙌 Alert fatigue is such a real challenge in DevOps, and automation definitely feels like the way forward. It’s kind of similar to how financial planning tools work — like the TSP Calculator, which automates retirement savings projections so you don’t get overwhelmed by the numbers. Having smart systems in place, whether for ops or money, really helps cut through the noise and focus on what matters.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.