We’ve formalised a lot over the past few months, however Tech:Incidents on Meta is still guidelines and doesn’t cover clear approaches to what is an incident, what isn’t and what must be reported and what can be left to discretion.
|Open||John||T8793 Create a formal Incident Response/Management Process|
|Resolved||John||T8843 Implement documentation for all monitoring checks|
|Resolved||John||T8844 Allow defining an alert as critical|
|Open||John||T8845 Allow Icinga to generate Phabricator tasks for Critical alerts|