• 使用大卫的k8s监控面板(k8s+prometheus+grafana)


    问题

    书接上回,对EKS(AWS云k8s)启用AMP(AWS云Prometheus)监控+AMG(AWS云 grafana),上次我们只是配通了EKS+AMP+AMG的监控路径。这次使用一位大卫老师的grafana的面板,具体地址如下:
    https://grafana.com/grafana/dashboards/15757-kubernetes-views-global/

    安装kube-state-metrics

    为了想Prometheus暴露一些有用的性能指标,需要在k8s集群中,安装kube-state-metrics。

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
    helm install kube-state-metrics prometheus-community/kube-state-metrics -n kube-system
    
    • 1
    • 2
    • 3

    测试验证:

    kubectl port-forward svc/kube-state-metrics -n kube-system 8080:8080
    
    • 1

    使用PromQL测试:

    count(kube_pod_status_ready{condition="false"}) by (namespace, pod)
    
    • 1

    prometheus配置

    scrape_configs:
    - job_name: kube-state-metrics
      honor_timestamps: true
      scrape_interval: 1m
      scrape_timeout: 1m
      metrics_path: /metrics
      scheme: http
      static_configs:
      - targets:
        - kube-state-metrics.kube-system.svc.cluster.local:8080
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    安装 prometheus-node-exporter

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
    helm install prometheus-node-exporter prometheus-community/prometheus-node-exporter -n kube-system
    
    • 1
    • 2
    • 3

    测试:

    export POD_NAME=$(kubectl get pods --namespace kube-system -l "app.kubernetes.io/name=prometheus-node-exporter,app.kubernetes.io/instance=prometheus-node-exporter" -o jsonpath="{.items[0].metadata.name}")
    kubectl port-forward --namespace kube-system $POD_NAME 9100
    
    • 1
    • 2

    prometheus配置

    scrape_configs:
    - job_name: 'node-exporter'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: replace
        source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    整体prometheus配置

    global:
      scrape_interval: 30s
      # external_labels:
        # clusterArn: 
    scrape_configs:
      # pod metrics
      - job_name: pod_exporter
        kubernetes_sd_configs:
          - role: pod
      # container metrics
      - job_name: cadvisor
        scheme: https
        authorization:
          credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - replacement: kubernetes.default.svc:443
            target_label: __address__
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
      # apiserver metrics
      - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        job_name: kubernetes-apiservers
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
        - action: keep
          regex: default;kubernetes;https
          source_labels:
          - __meta_kubernetes_namespace
          - __meta_kubernetes_service_name
          - __meta_kubernetes_endpoint_port_name
        scheme: https
      # kube proxy metrics
      - job_name: kube-proxy
        honor_labels: true
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - action: keep
          source_labels:
          - __meta_kubernetes_namespace
          - __meta_kubernetes_pod_name
          separator: '/'
          regex: 'kube-system/kube-proxy.+'
        - source_labels:
          - __address__
          action: replace
          target_label: __address__
          regex: (.+?)(\\:\\d+)?
          replacement: $1:10249
      # kube-state-metrics
      - job_name: kube-state-metrics
        honor_timestamps: true
        scrape_interval: 1m
        scrape_timeout: 1m
        metrics_path: /metrics
        scheme: http
        static_configs:
        - targets:
          - kube-state-metrics.kube-system.svc.cluster.local:8080
      # node-exporter
      - job_name: 'node-exporter'
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: replace
          source_labels: [__address__]
          regex: '(.*):10250'
          replacement: '${1}:9100'
          target_label: __address__
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77

    这里需要重新创建一个抓取程序。

    效果

    全局监控效果

    参考

  • 相关阅读:
    LINUX 基本命令
    计算机视觉40例之案例05物体计数
    【计网】第六章 应用层
    数据结构 | 带头双向循环链表【无懈可击的链式结构】
    actual combat 33 —— Vue实战遇到的问题
    SpringBoot web开发-11-Thymeleaf 公共页面抽取
    【转】推送消息&推送机制
    1002 写出这个数【PAT (Basic Level) Practice (中文)】
    SQL sever中表数据管理
    华为云云耀云服务器L实例评测|华为云耀云服务器L实例启动宠物预约项目(九)
  • 原文地址:https://blog.csdn.net/fxtxz2/article/details/138069041