• Prometheus + AlertManager 消息预警


    安装 Prometheus

    配置 prometheus.yml,默认配置 https://prometheus.io/docs/prometheus/latest/getting_started/

    mkdir /opt/promethus
    cd /opt/promethus/
    vim prometheus.yml
    
    • 1
    • 2
    • 3

    默认 prometheus.yml

    global:
      scrape_interval:     15s # By default, scrape targets every 15 seconds.
    
      # Attach these labels to any time series or alerts when communicating with
      # external systems (federation, remote storage, Alertmanager).
      external_labels:
        monitor: 'codelab-monitor'
    
    # A scrape configuration containing exactly one endpoint to scrape:
    # Here it's Prometheus itself.
    scrape_configs:
      # The job name is added as a label `job=` to any timeseries scraped from this config.
      - job_name: 'prometheus'
    
        # Override the global default and scrape targets from this job every 5 seconds.
        scrape_interval: 5s
    
        static_configs:
          - targets: ['localhost:9090']
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19

    拉取镜像

    docker pull prom/prometheus
    
    • 1

    开始安装

    docker run -d \
        -p 9090:9090 \
        -v /opt/prometheus:/etc/prometheus \
        --name promethenus \
        --restart=always \
        prom/prometheus
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    -p 映射端口
    -v 挂载文件
    -name 实例命名
    –restart=always 重启 docker 容器自动启动

    浏览器 ip 加端口 9090 即可访问

     

    安装 AlertManager

    配置 alertmanager.yml,默认配置 https://prometheus.io/docs/prometheus/latest/getting_started/

    mkdir /opt/alertmanager
    cd /opt/alertmanager/
    vim alertmanager.yml
    
    • 1
    • 2
    • 3

    默认 alertmanager.yml

    global:
      resolve_timeout: 5m
    
    route:
      group_by: ['alertname']
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 1h
      receiver: 'web.hook'
    receivers:
    - name: 'web.hook'
      webhook_configs:
      - url: 'http://127.0.0.1:5001/'
    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19

    浏览器 ip 加端口 9090 即可访问

    拉取镜像

    docker pull bitnami/alertmanager:latest
    
    • 1

    开始安装

    docker run -d \
    	-p 9093:9093 \
    	--name alertmanager 
    	-v /opt/alertmanager/alertmanager.yml:/opt/bitnami/alertmanager/conf/config.yml  \
    	--restart=always \
    	bitnami/alertmanager:latest
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    浏览器 ip 加端口 9093 即可访问

     

    配置 Prometheus

    prometheus.yml 示例

    global:
      scrape_interval:     15s
      external_labels:
        monitor: 'codelab-monitor'
    
    scrape_configs:
      - job_name: 'prometheus'
    
        scrape_interval: 5s
    
        static_configs:
          - targets: ['localhost:9090']
    
      - job_name: 'service'
        static_configs:    
        - targets: ['192.168.xxx.0:9100']
        - targets: ['192.168.xxx.1:9100']
    
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          - [服务器 ip]:9093
          
    rule_files:
        - "rules.yml"
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26

    alerting.alertmanagers.static_configs.targets:AlertManager 部署的服务器 [ ip:端口 ]
    rule_files: 新建 rules.yml,路径和 prometheus.yml 在相同的地方

     
    rules.yml 示例

    groups:
    - name: CPU-rule
      rules:
      - alert: High-CPU-80
        expr: 100-avg(irate(node_cpu_seconds_total{job="component",mode="idle"}[5m]))by(instance)*100 > 80
        for: 1m
        labels:
          severity: warning 
        annotations:
          description: "{{$labels.instance}}: Client CPU is above 80% (current value is: {{ $value }}"
      - alert: High-CPU-90
        expr: 100-avg(irate(node_cpu_seconds_total{job="component",mode="idle"}[5m]))by(instance)*100 > 90
        for: 1m
        labels:
          severity: warning 
        annotations:
          description: "{{$labels.instance}}: Client CPU is above 90% (current value is: {{ $value }}"
          
    - name: Menory-rule
      rules:
      - alert: HighMenory-80
        expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes)))* 100 > 80
        for: 1m
        labels:
          severity: warning 
        annotations: 
          description: "jobname:{{$labels.job}}, instance:{{$labels.instance}}, Client num is above 80%, current value is: {{ $value }}" 
      - alert: HighMenory-90
        expr: (1 - (node_memory_MemAvailable_bytes / (node_memory_MemTotal_bytes)))* 100 > 90
        for: 1m
        labels:
          severity: warning 
        annotations: 
          description: "jobname:{{$labels.job}}, instance:{{$labels.instance}}, Client num is above 90%, current value is: {{ $value }}"  
     
          
    - name: jvm-rule
      rules:      
      - alert: High-jvm-80
        expr: jvm_memory_usage_after_gc_percent{} * 100 > 80
        for: 1m
        labels:
          severity: warning 
        annotations:
          description: "jobname:{{$labels.job}}, application :{{$labels.application }}, jvm num is above 80%, current value is: {{ $value }}"        
      - alert: High-jvm-90
        expr: jvm_memory_usage_after_gc_percent{} * 100 > 90
        for: 1m
        labels:
          severity: warning 
        annotations:
          description: "jobname:{{$labels.job}}, application :{{$labels.application }}, jvm num is above 90%, current value is: {{ $value }}"      
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52

    name:类似于分组,示例中分了 cpu、内存、jvm
    alert:自定义命名
    expr:指标值
    for:持续时间
    label:标签,自定义
    annotations:注解,自定义

    指标状态 在这里插入图片描述
    监控目标状态,status - targets
    在这里插入图片描述

     

    配置 AlertManager

    alertmanager.yml 示例

    global:
      smtp_smarthost: 'smtp.exmail.qq.com:465'
      smtp_from: 'fat@qq.com'
      smtp_auth_username: 'fat@qq.com'
      smtp_auth_password: '111'
      smtp_require_tls: false
    
    route:
      receiver: mail
      
    receivers:
    - name: 'mail'
      email_configs:
      - to: 'fat1@qq.com'
      - to: 'fat2@qq.com'
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15

    在这里插入图片描述

  • 相关阅读:
    多线程进阶(JUC)
    Problem C: 凯撒加密
    铸坯火焰自动切割系统的设计状况及存在的问题
    基于kafka项目之Keepalived高可用详细介绍
    《Python3 网络爬虫开发实战》:高效实用的 MongoDB 文档存储
    卓越进行时 | 赛宁助力职业院校实践“岗课赛证训创”育人模式
    Prometheus Operator与kube-prometheus之二-如何监控1.23+ kubeadm集群
    【图像重建】基于 L1范数自适应双边总变分超分辨率图像序列重建附matlab代码
    wireshark使用
    前端入门到入土?
  • 原文地址:https://blog.csdn.net/weixin_42555971/article/details/126013904