#案例分享 生产环境逐步迁移至k8s集群 - pod注册到consul
#项目背景
- 多套业务系统, 所有节点注册到consul集群,方便统一管理
- 使用consul的dns功能, 所有节点hostname能ping通
- 使用consul健康检查功能, 健康检查通过才添加到service
- 部分服务之前调用直接使用consul的server地址即:
service-name.service.datacenter.consul- prometheus监控使用consul-templates自动添加节点
- 运行环境是阿里云, k8s集群容器IP和云主机IP互通
#1.1 需要解决的问题
- 部分服务迁移k8s集群后, k8s集群外的服务需要直连pod的ip访问
#1.2 解决办法
- pod添加consul-agent容器注册到consul集群
#2.1 pod注册到consul产生的新问题
- pod退出或删除时, consul集群应删除pod
- prometheus监控模板consul-templates需要排除pod
#2.2 解决办法
- consul容器使用preStop钩子, 退出前执行consul leave主动离开consul集群
- consul-templates排除pod
-
- pod注册到consul集群时添加前缀如
k8s-
- pod注册到consul集群时添加前缀如
-
- consul-templates使用regexMatch正则匹配忽略
k8s-开头的节点
- consul-templates使用regexMatch正则匹配忽略
#演示demo如下
--- apiVersion: v1 kind: ConfigMap metadata: name: consul-demo-config namespace: default data: consul.json: |- { "datacenter": "qa", "acl_datacenter": "qa", "data_dir": "/tmp/consul", "bind_addr": "0.0.0.0", "client_addr": "0.0.0.0", "start_join": ["10.10.100.100"], "retry_join": ["10.10.100.100"], "retry_interval": "5s", "disable_host_node_id": true, "enable_script_checks": true, "disable_update_check": true, "leave_on_terminate": true, "log_level": "WARN", "server": false, "service": { "name": "qa-consul-demo", "port" : 80, "tags": ["k8s", "qa", "consul-demo"], "checks": [ { "id": "consul-demo-HealthCheck", "name": "Health Check", "notes": "Health Check", "args": [ "sh", "-c", "[ $(curl -s 127.0.0.1 -I |grep 'nginx' |wc -l) -eq 1 ] && { echo 'Health check successful'; exit 0 ; } || { echo 'check error' ; exit 2 ; }" ], "interval": "10s" } ] } } --- apiVersion: apps/v1 kind: Deployment metadata: name: consul-demo namespace: default spec: selector: matchLabels: app: consul-demo replicas: 2 template: metadata: labels: app: consul-demo spec: imagePullSecrets: - name: docker-image-key containers: - name: consul-agent image: consul:1.0.8 imagePullPolicy: IfNotPresent command: - sh - -c - | consul agent -config-dir=/opt/consul -node=k8s-qa-$HOSTNAME -rejoin lifecycle: preStop: exec: command: - sh - -c - | consul leave volumeMounts: - mountPath: "/etc/consul" name: consul-conf resources: requests: cpu: 10m memory: 16Mi limits: cpu: 50m memory: 32Mi readinessProbe: tcpSocket: port: 8500 livenessProbe: tcpSocket: port: 8500 volumeMounts: - name: consul-config mountPath: "/opt/consul" - name: nginx-node image: alivv/nginx:node imagePullPolicy: IfNotPresent volumes: - name: consul-config configMap: name: consul-demo-config items: - key: consul.json path: consul.json
监控模板consul-templates如下
- job_name: 'node' static_configs: {{range nodes}} - targets: ['{{.Node}}:9100'] labels: instance: {{.Node}}{{end}}
修改后如下, 使用regexMatch正则匹配排除k8s-开头的节点名称
- job_name: 'node' static_configs: {{range nodes}}{{if .Node | regexMatch "^k8s-.*" }}{{else}} - targets: ['{{.Node}}:9100'] labels: instance: {{.Node}}{{end}}{{end}}