Skip to main content

What’s the difference between group_interval, group_wait, and repeat_interval?

In this blogpost we try and clear up some confusion by outlining the key differences between commonly confused alerting configuration options: group_intervalgroup_wait, and repeat_interval.
Before digging into these 3 Alertmanager configuration options, let's recap on some Prometheus alerting basics.
Prometheus itself has two global clocks: scrape_interval and evaluation_interval.

The scrape_interval is the time between each Prometheus scrape (i.e when Prometheus is pulling data from exporters etc.), and the evaluation_interval is the time between each evaluation of Prometheus' alerting rules.
When a rule is evaluated, its state can be altered to be either inactive, pending, or firing.
Following evaluation, this state is sent to the connected Alertmanager to potentially start/stop the sending of alert notifications.

This is where group_by comes into play.
In order to avoid continuously sending notifications for similar alerts (like the same process failing on multiple instances, nodes, and data centres), the Alertmanager may be configured to group these related alerts into one alert:
group_by: ['alertname', 'job']
Instead we wait for the group_interval since the last notification was sent to the group, and then send all alerts firing (and any resolved alerts) to the receiver.

group_wait sets how long to initially wait to send a notification for a particular group of alerts.
This allows the Alertmanager to wait for an inhibiting alert to arrive or to collect more initial alerts for the same group. It essentially buffers alerts from Prometheus sent to the Alertmanager that are grouped by the same labels:
group_by: ['alertname', 'job']
group_wait: 45s # Usually set between ~0s to a few minutes.
While this reduces noisy alerts and saves the people receiving them some headache, it may introduce longer delays in receiving said alert notifications.
Another issue we must consider is that we'll receive the same grouped alert notification again next time the rules are evaluated.

This is where we use group_interval.
group_interval dictates how long to wait before sending notifications about new alerts that are added to a group of alerts that have been alerted on before:
group_by: ['instance', 'job']
group_wait: 45s
group_interval: 10m # Usually ~5 mins or more.

So where does repeat_interval fit into all of this?
Simply put, repeat_interval is used to determine the wait time before a firing alert that has already been successfully sent to the receiver is sent again.

To summarise:

group_wait
How long to wait to buffer alerts of the same group before sending initially.

group_interval
How long to wait before sending an alert that has been added to a group which contains already fired alerts.

repeat_interval
How long to wait before re-sending a given alert that has already been sent.


Comments

Popular posts from this blog

CKA Simulator Kubernetes 1.22

  https://killer.sh Pre Setup Once you've gained access to your terminal it might be wise to spend ~1 minute to setup your environment. You could set these: alias k = kubectl                         # will already be pre-configured export do = "--dry-run=client -o yaml"     # k get pod x $do export now = "--force --grace-period 0"   # k delete pod x $now Vim To make vim use 2 spaces for a tab edit ~/.vimrc to contain: set tabstop=2 set expandtab set shiftwidth=2 More setup suggestions are in the tips section .     Question 1 | Contexts Task weight: 1%   You have access to multiple clusters from your main terminal through kubectl contexts. Write all those context names into /opt/course/1/contexts . Next write a command to display the current context into /opt/course/1/context_default_kubectl.sh , the command should use kubectl . Finally write a second command doing the same thing into ...

OWASP Top 10 Threats and Mitigations Exam - Single Select

Last updated 4 Aug 11 Course Title: OWASP Top 10 Threats and Mitigation Exam Questions - Single Select 1) Which of the following consequences is most likely to occur due to an injection attack? Spoofing Cross-site request forgery Denial of service   Correct Insecure direct object references 2) Your application is created using a language that does not support a clear distinction between code and data. Which vulnerability is most likely to occur in your application? Injection   Correct Insecure direct object references Failure to restrict URL access Insufficient transport layer protection 3) Which of the following scenarios is most likely to cause an injection attack? Unvalidated input is embedded in an instruction stream.   Correct Unvalidated input can be distinguished from valid instructions. A Web application does not validate a client’s access to a resource. A Web action performs an operation on behalf of the user without checkin...