总结：Prometheus存储

一、介绍

prometheus 提供了本地存储（TSDB）时序型数据库的存储方式，在2.0版本之后，压缩数据的能力得到了大大的提升（每个采样数据仅仅占用3.5byte左右空间），单节点情况下可以满足大部分用户的需求，但本地存储阻碍了prometheus集群化的实现，因此在集群中应当采用其他时序性数据来替代，比如influxdb。

prometheus 分为三个部分，分别是：抓取数据、存储数据和查询数据。

在早期有一个单独的项目叫做 TSDB，但是，在2.1.x的某个版本，已经不单独维护这个项目了，直接将这个项目合并到了prometheus的主干上了。

二、案例

我们的prometheus目前申请了三个文件系统NFS，托管集群存在一个文件系统，QKE集群存在另一个文件系统，单都是存储到云端（StorageClass）。

存储的方式：每个prometheus实例对应一个目录，如果有多个prometheus实例，则对应多个目录。

另外，注意，每个文件系统对应一个StorageClass自由对象。

以prometheus托管集群为例，如下是创建StorageClass

关键的两个文件：

创建StorageClass资源对象：


kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: prometheus-data-03
provisioner: prometheus-data-03/nfs
reclaimPolicy: Retain

创建StorageClass后，还需要K8S存储插件进行真正的存储工作：

nfs-client-provisioner 可动态为kubernetes提供pv卷，是Kubernetes的简易NFS的外部provisioner，本身不提供NFS，需要现有的NFS服务器提供存储。持久卷目录的命名规则为:${namespace}-${pvcName}-${pvName}。

树莓派k8s集群安装nfs-client-provisioner_崔一凡的技术博客_51CTO博客


kind: Deployment
apiVersion: apps/v1
metadata:
  name: nfs-provisioner-03
spec:
  selector:
    matchLabels:
      app: nfs-provisioner-03
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nfs-provisioner-03
    spec:
      serviceAccount: nfs-provisioner
      containers:
        - name: nfs-provisioner
          image: docker-registry.xxx.virtual/hubble/nfs-client-provisioner:v3.1.0-k8s1.11
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: prometheus-data-03/nfs
            - name: NFS_SERVER
              value: hubble-587ceb02-5e85e802.cnhz1.qfs.xxx.storage
            - name: NFS_PATH
              value: /hubble-wuhan-lkg
      volumes:
        - name: nfs-client-root
          nfs:
            server: hubble-587ceb02-5e85e802.cnhz1.qfs.xxx.storage
            path: /hubble-wuhan-lkg

上面的server地址其实就是云存储那边提供的地址：

部署prometheus：


kind: Service
apiVersion: v1
metadata:
  name: prometheus-headless
  namespace: example-nfs
  labels:
    app.kubernetes.io/name: prometheus
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app.kubernetes.io/name: prometheus
  ports:
  - name: web
    protocol: TCP
    port: 9090
    targetPort: web
  - name: grpc
    port: 10901
    targetPort: grpc
---
 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: example-nfs
 
---
 
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: prometheus-example-nfs
subjects:
  - kind: ServiceAccount
    name: prometheus
    namespace: example-nfs
roleRef:
  kind: ClusterRole
  name: prometheus
  apiGroup: rbac.authorization.k8s.io
---
 
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: example-nfs
  labels:
    app.kubernetes.io/name: prometheus
spec:
  serviceName: prometheus-headless
  podManagementPolicy: Parallel
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: prometheus
  template:
    metadata:
      labels:
        app.kubernetes.io/name: prometheus
    spec:
      serviceAccountName: prometheus
      securityContext:
        fsGroup: 1000
        runAsUser: 0
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - prometheus
            topologyKey: kubernetes.io/hostname
      containers:
      - name: prometheus
        image: docker-registry.xxx.virtual/hubble/prometheus:v2.34.0
        args:
        - --config.file=/etc/prometheus/config_out/prometheus.yaml
        - --storage.tsdb.path=/prometheus
        - --storage.tsdb.retention.time=30d
        - --web.external-url=/example-nfs/prometheus
        - --web.enable-lifecycle
        - --storage.tsdb.no-lockfile
        - --storage.tsdb.min-block-duration=2h
        - --storage.tsdb.max-block-duration=1d
        ports:
        - containerPort: 9090
          name: web
          protocol: TCP
        livenessProbe:
          failureThreshold: 6
          httpGet:
            path: /example-nfs/prometheus/-/healthy
            port: web
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 3
        readinessProbe:
          failureThreshold: 120
          httpGet:
            path: /example-nfs/prometheus/-/ready
            port: web
            scheme: HTTP
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 3
        resources:
          requests:
            memory: 1Gi
          limits:
            memory: 30Gi
        volumeMounts:
        - mountPath: /etc/prometheus/config_out
          name: prometheus-config-out
          readOnly: true
        - mountPath: /prometheus
          name: prometheus-storage
        - mountPath: /etc/prometheus/rules
          name: prometheus-rules
      - name: thanos
        image: docker-registry.xxx.virtual/hubble/thanos:v0.25.1
        args:
        - sidecar
        - --tsdb.path=/prometheus
        - --prometheus.url=http://127.0.0.1:9090/example-nfs/prometheus
        - --reloader.config-file=/etc/prometheus/config/prometheus.yaml.tmpl
        - --reloader.config-envsubst-file=/etc/prometheus/config_out/prometheus.yaml
        - --reloader.rule-dir=/etc/prometheus/rules/
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        ports:
        - name: http-sidecar
          containerPort: 10902
        - name: grpc
          containerPort: 10901
        livenessProbe:
            httpGet:
              port: 10902
              path: /-/healthy
        readinessProbe:
          httpGet:
            port: 10902
            path: /-/ready
        volumeMounts:
        - name: prometheus-config-tmpl
          mountPath: /etc/prometheus/config
        - name: prometheus-config-out
          mountPath: /etc/prometheus/config_out
        - name: prometheus-rules
          mountPath: /etc/prometheus/rules
        - name: prometheus-storage
          mountPath: /prometheus
      volumes:
      - name: prometheus-config-tmpl
        configMap:
          defaultMode: 420
          name: prometheus-config-tmpl
      - name: prometheus-config-out
        emptyDir: {}
      - name: prometheus-rules
        configMap:
          name: prometheus-rules
  volumeClaimTemplates:
  - metadata:
      name: prometheus-storage
      labels:
        app.kubernetes.io/name: prometheus
    spec:
      storageClassName: prometheus-data-03
      accessModes:
      - ReadWriteOnce
      volumeMode: Filesystem
      resources:
        requests:
          storage: 100Gi
        limits:
          storage: 300Gi

--storage.tsdb.min-block-duration：控制数据落盘的时间，即最小落盘时间是2小时

--storage.tsdb.min-block-duration：控制数据落盘的时间，即最大落盘时间是1天

--storage.tsdb.retention.time配置本地保留30天数据，减少空间占用；

如上，volumeClaimTemplates即是定义一个PVC，如何创建PV呢？使用StorageClass定义的模板去创建PV。

那么prometheus查询数据的时候是怎么查的呢？

其实对于prometheus来说，就是从prometheus这个实例提供的接口查询数据而已，和普通的容器一样，不管是云存储还是node存储或是其他存储，对于容器来说是透明的，都是从挂载到的目录去查询，所以远端对于prometheus来说就是查本地的目录文件。

三、存储原理

prometheus将采集到的样本以时间序列的方式保存在内存（TSDB 时序数据库）中，并定时保存到硬盘中。

与zabbix不同，zabbix会保存所有的数据，而prometheus本地存储会保存15天，超过15天以上的数据将会被删除，若要永久存储数据，有两种方式：

方式一：修改prometheus的配置参数“storage.tsdb.retention.time=10000d”；
方式二：将数据引入存储到Influcdb中。

prometheus按照block块的方式来存储数据，每2小时为一个时间单位，首先会存储到内存中，当到达2小时后，会自动写入磁盘中。

为防止程序异常而导致数据丢失，采用了WAL机制，即2小时内记录的数据存储在内存中的同时，还会记录一份日志，存储在block下的wal目录中。当程序再次启动时，会将wal目录中的数据写入对应的block中，从而达到恢复数据的效果。

当删除数据时，删除条目会记录在tombstones 中，而不是立刻删除。

prometheus采用的存储方式称为“时间分片”，每个block都是一个独立的数据库。优势是可以提高查询效率，查哪个时间段的数据，只需要打开对应的block即可，无需打开多余数据。

四、数据备份

1、完全备份

　　备份prometheus的data目录可以达到完全备份的目的，但效率较低。

2、快照备份

prometheus提供了一个功能，是通过API的方式，快速备份数据。实现方式：

首先，修改prometheus的启动参数，新增以下两个参数：


--storage.tsdb.path=/usr/local/share/prometheus/data \
--web.enable-admin-api

然后，重启prometheus
最后，调用接口备份：


# 不跳过内存中的数据，即同时备份内存中的数据
curl -XPOST http://127.0.0.1:9090/api/v2/admin/tsdb/snapshot?skip_head=false
# 跳过内存中的数据
curl -XPOST http://127.0.0.1:9090/api/v2/admin/tsdb/snapshot?skip_head=true

skip_head作用：是否跳过存留在内存中还未写入磁盘中的数据，仍在block块中的数据，默认是false

五、数据还原

利用api方式制作成snapshot后，还原时将snapshot中的文件覆盖到data目录下，重启prometheus即可！

添加定时备份任务（每周日3点备份）


crontable -e                              #注意时区，修改完时区后，需要重启 crontab    systemctl  restart cron
 
0 3 * * 7 sudo /usr/bin/curl -XPOST -I http://127.0.0.1:9090/api/v1/admin/tsdb/snapshot >> /home/bill/prometheusbackup.log

相关阅读:
MySQL基础学习总结（三）
软件定义汽车，通信连接世界 | 2024汽车软件与通信大会开幕
 OKHttp
java计算机毕业设计ssm智慧农贸信息化管理平台(源码+系统+mysql数据库+Lw文档）
【附源码】计算机毕业设计SSM图书销售系统设计
 【异常检测】数据挖掘领域常用异常检测算法总结以及原理解析（一）
Spring Security整合企业微信的扫码登录，企微的API震惊到我了
 [附源码]计算机毕业设计JAVAjsp宾馆客房管理系统
 vue3 - 开发和生产环境通过Mock模拟真实接口请求
 SPA（单页应用）首屏加载速度慢怎么解决？
原文地址：https://blog.csdn.net/w2009211777/article/details/126014725