Skip to main content

High Availability Prometheus Alerting and Notification

Prometheus is architected for reliability of alerting, how do you set it up?
For a setup that can gracefully handle any machine failing, we'll need to run two Prometheus servers and two Alertmanagers. First we'll run the Alertmanagers on different machines, and setup a mesh between them:
# On a machine named "am-1":
wget https://github.com/prometheus/alertmanager/releases/download/v0.15.3/alertmanager-0.15.3.linux-amd64.tar.gz
tar -xzf alertmanager-*.linux-amd64.tar.gz
cd alertmanager-*
./alertmanager --cluster.peer=am-2:9094


# On a machine named "am-2":
wget https://github.com/prometheus/alertmanager/releases/download/v0.15.3/alertmanager-0.15.3.linux-amd64.tar.gz
tar -xzf alertmanager-*.linux-amd64.tar.gz
cd alertmanager-*
./alertmanager --cluster.peer=am-1:9094
To verify that the Alertmanager mesh is working correctly, create a silence in one Alertmanager. If it shows up in the other, then all is well.

Next we configure the Prometheus servers to talk to the Alertmanager:
# On a machine named "prom-1":
wget https://github.com/prometheus/prometheus/releases/download/v2.5.0/prometheus-2.0.0.linux-amd64.tar.gz
tar -xzf prometheus-*.linux-amd64.tar.gz
cd prometheus-*
cat > prometheus.yml << EOF
global:
  external_labels:
    dc: europe1    
alerting:
  alert_relabel_configs:
    - source_labels: [dc]
      regex: (.+)\d+
      target_label: dc
  alertmanagers:
    - static_configs:
      - targets: ['am-1:9093', 'am-2:9093']
# The rest of your Prometheus config goes here as usual.
EOF
./prometheus


# On a machine named "prom-2":
wget https://github.com/prometheus/prometheus/releases/download/v2.5.0/prometheus-2.0.0.linux-amd64.tar.gz
tar -xzf prometheus-*.linux-amd64.tar.gz
cd prometheus-*
cat > prometheus.yml << EOF
global:
 external_labels:
   dc: europe2   # Note that this is different only by the trailing number.
alerting:
 alert_relabel_configs:
 - source_labels: [dc]
   regex: (.+)\d+
   target_label: dc
 alertmanagers:
 - static_configs:
   - targets: ['am-1:9093', 'am-2:9093']
# The rest of your Prometheus config goes here as usual.
EOF
./prometheus
The key point here is that both Prometheus servers talk to both Alertmanagers.
In addition the two Prometheus servers have slightly different external labels, so their data does not conflict if remote storage is in use. We then use alert relabelling to ensure they still send identically labelled alerts, which the Alertmanager will automatically de-duplicate.

As long as one Prometheus and one Alertmanager are working and can talk to each other, alerts and notifications will get through!

Comments

Popular posts from this blog

CKA Simulator Kubernetes 1.22

  https://killer.sh Pre Setup Once you've gained access to your terminal it might be wise to spend ~1 minute to setup your environment. You could set these: alias k = kubectl                         # will already be pre-configured export do = "--dry-run=client -o yaml"     # k get pod x $do export now = "--force --grace-period 0"   # k delete pod x $now Vim To make vim use 2 spaces for a tab edit ~/.vimrc to contain: set tabstop=2 set expandtab set shiftwidth=2 More setup suggestions are in the tips section .     Question 1 | Contexts Task weight: 1%   You have access to multiple clusters from your main terminal through kubectl contexts. Write all those context names into /opt/course/1/contexts . Next write a command to display the current context into /opt/course/1/context_default_kubectl.sh , the command should use kubectl . Finally write a second command doing the same thing into ...

OWASP Top 10 Threats and Mitigations Exam - Single Select

Last updated 4 Aug 11 Course Title: OWASP Top 10 Threats and Mitigation Exam Questions - Single Select 1) Which of the following consequences is most likely to occur due to an injection attack? Spoofing Cross-site request forgery Denial of service   Correct Insecure direct object references 2) Your application is created using a language that does not support a clear distinction between code and data. Which vulnerability is most likely to occur in your application? Injection   Correct Insecure direct object references Failure to restrict URL access Insufficient transport layer protection 3) Which of the following scenarios is most likely to cause an injection attack? Unvalidated input is embedded in an instruction stream.   Correct Unvalidated input can be distinguished from valid instructions. A Web application does not validate a client’s access to a resource. A Web action performs an operation on behalf of the user without checkin...