ALERTS NEED OWNERS
I purposefully did not include a severity of
email
or chat
in my examples. To explain why, let me tell you a story.
I was once on a team that had to create a team mailing list every few months. There was a mailing list for email alerts, but alerts sent there didn’t always get the attention that was desired as there were just too many of them and responsibility was diffuse, which is to say it wasn’t actually anyone’s job to take care of them. There were some alerts considered important, but not important enough to page the oncall engineer. So these alerts were sent to the main team mailing list, in the hope that someone would take a look. Fast forward a bit and the exact same thing happened to the team mailing list, which now had regular automated alerts coming in. At some point it got bad enough that a new team mailing list was created, and this story repeated itself, at which point this team had three email alert lists.
Based on this experience and that of others, I strongly discourage email alerts and alerts that go to a team.4 Instead, I advocate having alert notifications going to a ticketing system of some form, where they will be assigned to a specific person whose job it is to handle them. I have also seen it work out to have a daily email to the oncall that lists all currently firing alerts.
After an outage it is everyone’s fault for not looking at the email alerts,5 but still not anyone’s responsibility. The key point is that there needs to be ownership and not merely using email as logging.
The same applies to chat messages for alerts, with messaging systems such as IRC, Slack, and Hipchat. Having your pages duplicated to your messaging system is handy, and pages are rare. Having nonpages duplicated has the same issues as email alerts, and is worse as it tends to be more distracting. You can’t filter chat messages away to a folder you ignore like you do with emails.
Comments
Post a Comment
https://gengwg.blogspot.com/