Clickhouse 代理 Chproxy 实战

介绍

Chproxy 是一个用于 ClickHouse 数据库的 HTTP 代理、负载均衡器。具有以下特性：**具体详情到官网查看即可 Chproxy **

支持根据输入用户代理请求到多个 ClickHouse 集群。比如，把来自 appserver 的用户请求代理到 stats-raw 集群，把来自 reportserver 用户的请求代理到 stats-aggregate 集群。
支持将输入用户映射到每个 ClickHouse 实际用户，这能够防止暴露 ClickHouse 集群的真实用户名称、密码信息。此外，chproxy 还允许映射多个输入用户到某一个单一的 ClickHouse 实际用户。
支持接收 HTTP 和 HTTPS 请求。
支持通过 IP 或 IP 掩码列表限制 HTTP、HTTPS 访问。
支持通过 IP 或 IP 掩码列表限制每个用户的访问。
支持限制每个用户的查询时间，通过 KILL QUERY 强制杀执行超时或者被取消的查询。
支持限制每个用户的请求频率。
支持限制每个用户的请求并发数。
所有的限制都可以对每个输入用户、每个集群用户进行设置。
支持自动延迟请求，直到满足对用户的限制条件。
支持配置每个用户的响应缓存。
响应缓存具有内建保护功能，可以防止惊群效应（thundering herd），即 dogpile 效应。
通过 least loaded 和 round robin 技术实现请求在副本和节点间的均衡负载。
支持检查节点健康情况，防止向不健康的节点发送请求。
通过 Let’s Encrypt 支持 HTTPS 自动签发和更新。
可以自行指定选用 HTTP 或 HTTPS 向每个配置的集群代理请求。
在将请求代理到 ClickHouse 之前，预先将 User-Agent 请求头与远程/本地地址，和输入/输出的用户名进行关联，因此这些信息可以在 system.query_log.http_user_agent 中查询到。
暴露各种有用的符合 Prometheus 内容格式的指标（metrics）。
支持配置热更新，配置变更无需重启 —— 只需向 chproxy 进程发送一个 SIGHUP 信号即可。
易于管理和运行 —— 只需传递一个配置文件路径给 chproxy 即可。

如何安装官网

安装简单，只需下载最新的包、解压即可启动

解压文件到 /opt/module/chproxy/ 目录

配置文件`config.yml`

# 是否打印调试日志。
# Whether to print debug logs.
# 
# By default debug logs are disabled.
log_debug: true

# 配置解析时是否忽略安全检查。
# Whether to ignore security checks during config parsing.
#
# By default security checks are enabled.
hack_me_please: true

# 可选的响应缓存配置。
# Optional response cache configs.
# 
# Multiple distinct caches with different settings may be configured.
caches:
    # Cache name, which may be passed into `cache` option on the `user` level.
    #
    # Multiple users may share the same cache.
  - name: "longterm"

    # Cache mode, either [[file_system]] or [[redis]] 
    mode: "file_system"
    
    # Applicable for cache mode: file_system
    file_system:
      # 将存储缓存响应的目录的路径。
      # Path to directory where cached responses will be stored.
      dir: "/opt/module/chproxy/longterm/cachedir"
    
      # Maximum cache size.
      # `Kb`, `Mb`, `Gb` and `Tb` suffixes may be used.
      max_size: 512Mb

    # Expiration time for cached responses.
    expire: 1h

# 应用于每个查询的命名参数列表
# 用来向ck发送请求的时候查询参数的列表，会覆盖ck本身的参数
# Named list of parameters to apply to each query
param_groups:
  # 组名，可以传入 `user` 级别的 `params` 选项。
  # Group name, which may be passed into `params` option on the `user` level.
  - name: "default_param_setting"
    # 要发送的键值参数列表
    # List of key-value params to send
    params: 
      - key: "replication_alter_partitions_sync"
        value: "2"
      - key: "max_memory_usage"
        value: "3000000000"
      - key: "max_bytes_before_external_group_by"
        value: "3000000000"
      - key: "max_bytes_before_external_sort"
        value: "3000000000"

# `chproxy` 输入接口的设置。
# Settings for `chproxy` input interfaces.
server:
  # 输入http接口的配置。
  # Configs for input http interface.
  # The interface works only if this section is present.
  http:
    # TCP address to listen to for http.
    # May be in the form IP:port . IP part is optional.
    listen_addr: ":9090"

    # List of allowed networks or network_groups.
    # Each item may contain IP address, IP subnet mask or a name
    # from `network_groups`.
    # By default requests are accepted from all the IPs.
    # allowed_networks: ["0.0.0.0"]

    # ReadTimeout 是代理读取整个文件的最大持续时间 # 请求，包括正文。
    # ReadTimeout is the maximum duration for proxy to reading the entire
    # request, including the body.
    # Default value is 1m
    read_timeout: 5m

    # WriteTimeout 是在超时写入响应之前代理的最大持续时间
    # WriteTimeout is the maximum duration for proxy before timing out writes of the response.
    # Default is largest MaxExecutionTime + MaxQueueTime value from Users or Clusters
    write_timeout: 10m

    # IdleTimeout 是代理等待下一个请求的最长时间。
    # IdleTimeout is the maximum amount of time for proxy to wait for the next request.
    # Default is 10m
    idle_timeout: 20m

# Configs for input users.
users:
    # Name and password are used to authorize access via BasicAuth or
    # via `user`/`password` query params.
    # Password is optional. By default empty password is used.
  - name: "default"
    password: "123456"
    to_cluster: "my_cluster"
    to_user: "default"
    params: "default_param_setting"

  - name: "writer"
    password: "123456"

    # Requests from the user are routed to this cluster.
    to_cluster: "my_cluster"

    # Input user is substituted by the given output user from `to_cluster`
    # before proxying the request.
    to_user: "default"

    # 最大并发查询
    #max_concurrent_queries: 1
    # 用户查询执行的最大持续时间 默认情况下，查询时长没有限制。
    # Chproxy 会自动杀死超过 max_execution_time 限制的查询
    #max_execution_time: 2s
    
    # 每分钟请求限制
    # 如果 也设置了， 取最小的生效
    # Requests per minute limit for the given input user.
    # By default there is no per-minute limit.
    #requests_per_minute: 6

    # 队列中等待执行的最大请求数。默认情况下，请求被执行而不在队列中等待
    # 和下面的参数组合使用 分别是排队数量和排队请求等待时候，默认不等待直接执行
    # The maximum number of requests that may wait for their chance
    # to be executed because they cannot run now due to the current limits.
    #
    # This option may be useful for handling request bursts from `tabix`
    # or `clickhouse-grafana`.
    #
    # By default all the requests are immediately executed without
    # waiting in the queue.
    max_queue_size: 1

    # 请求在队列中等待的最大持续时间,默认使用 10s 持续时间
    # The maximum duration the queued requests may wait for their chance
    # to be executed.
    # This option makes sense only if max_queue_size is set.
    # By default requests wait for up to 10 seconds in the queue.
    max_queue_time: 35s
    
    # 参数组
    # 用来向ck发送请求的时候查询参数的列表，会覆盖ck本身的参数
    # Optional group of params name to send to ClickHouse with each proxied request from 
    # # By default no additional params are sent to ClickHouse.
    params: "default_param_setting" 
    
    # 缓存的名称
    # Response cache config name to use.
    # By default responses aren't cached
    #cache: "longterm"



# Configs for ClickHouse clusters.
clusters:
    # The cluster name is used in `to_cluster`.
  - name: "my_cluster"

    # Protocol to use for communicating with cluster nodes.
    # Currently supported values are `http` or `https`.
    # By default `http` is used.
    scheme: "http"
    replicas:
    - name: "replica1"
      nodes: ["172.26.20.120:8123", "172.26.20.121:8123"]
    - name: "replica2"
      nodes: ["172.26.20.122:8123", "172.26.20.123:8123"]

    # User configuration for heart beat requests.
    # Credentials of the first user in clusters.users will be used for heart beat requests to clickhouse.
    heartbeat:
      # 检查所有集群节点可用性的时间间隔
      # An interval for checking all cluster nodes for availability
      # By default each node is checked for every 5 seconds.
      interval: 5s

      # 集群节点等待响应超时
      # A timeout of wait response from cluster nodes
      # By default 3s
      timeout: 10s

      # 设置在健康检查中请求的 URI 的参数 
      # The parameter to set the URI to request in a health check
      # By default "/?query=SELECT%201"
      request: "/?query=SELECT%201%2B1"

      # clickhouse 对健康检查请求的参考响应
      # Reference response from clickhouse on health check request
      # By default "1\n"
      response: "2\n"

    # 使用此用法会终止超时查询
    # Timed out queries are killed using this user.
    # By default `default` user is used.
    kill_query_user:
      name: "default"
      password: "123456"

    # Configuration for cluster users.
    users:
        # The user name is used in `to_user`.
      - name: "default"
        password: "123456"
        # 用户最大并发查询数
        #max_concurrent_queries: 1
        # 用户查询执行的最大持续时间
        #max_execution_time: 5s
        # 用户每分钟的最大请求数
        # 如果 配置了，取最小的生效 
        #requests_per_minute: 5
        # 队列中等待执行的最大请求数。
        max_queue_size: 1
        # 请求在队列中等待的最大持续时间。
        max_queue_time: 10s

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217

启动

/opt/module/chproxy/chproxy -config=/opt/module/chproxy/cofig.yml
1

启动命令包装 `start.sh`

baseDir=$(cd `dirname $0`;pwd;)
nohup $baseDir/chproxy -config=$baseDir/cofig.yml > $baseDir/logs/chproxy.log 2>&1 & echo $!> $baseDir/pid
1
2

停止命令包装 `shutdown.sh`

baseDir=$(cd `dirname $0`;pwd;)
kill -9  `cat $baseDir/pid`
1
2

重启命令包装 `restart.sh`

#!/bin/bash
baseDir=$(cd `dirname $0`;pwd;)
kill -9  `cat $baseDir/pid`
nohup $baseDir/chproxy -config=$baseDir/cofig.yml > $baseDir/logs/chproxy.log 2>&1 & echo $!> $baseDir/pid
1
2
3
4

查看日志

tail -f /opt/module/chproxy/logs/chproxy.log
1

相关阅读:
人工智能与机器学习
 猫头虎分享已解决Bug || **Eslint插件安装问题Unable to resolve eslint-plugin-猫头虎
 Springcloudgateway如何在全局过滤器中获得请求体和响应体
 Spring 08: AOP面向切面编程 + 手写AOP框架
 RabbitMQ（九）【内存磁盘的监控】
结构型设计模式07-享元模式
 使用UDP协议实现简单的分布式日志服务, java和python
Springboot:静态资源映射方式
 计算机网络四、五层协议体系结构-----数据链路层
 实战项目:瑞吉外卖开发笔记
原文地址：https://blog.csdn.net/guaoran/article/details/126858764

Clickhouse 代理 Chproxy 实战

介绍

如何安装官网

配置文件config.yml

启动

启动命令包装 start.sh

停止命令包装 shutdown.sh

重启命令包装 restart.sh

查看日志

配置文件`config.yml`

启动命令包装 `start.sh`

停止命令包装 `shutdown.sh`

重启命令包装 `restart.sh`