【大数据知识】今天聊聊Clickhouse部署方案
ClickHouse部署
-
- 一、ClickHouse部署
-
- 一、单节点部署
-
- 1. 安装准备
- 2. 目录规划
- 3. 核心配置
- 4. 启动服务
- 二、集群部署方案
-
- 1. 集群拓扑设计
- 2. 分布式配置
- 3. 表引擎选择
- 三、安全加固
-
- 1. 认证配置
- 2. SSL加密
- 四、性能优化
-
- 1. 核心参数调优
- 2. 资源隔离
- 五、监控与维护
-
- 1. Prometheus 集成
- 2. 关键监控指标
- 六、备份与恢复
-
- 1. 冷备份方案
- 2. 跨集群同步
- 七、最佳实践
- 二、ClickHouse跨机房集群部署方案
-
- 1. 架构设计
- 2. 网络优化配置
- 3. 数据同步策略
- 4. 容灾切换方案
- 5. 性能优化参数
- 6. 监控指标体系
- 7. 运维管理规范
- 8. 成本优化建议
- 关联知识
一、ClickHouse部署
以下为 ClickHouse 生产级集群部署的完整步骤及配置指南,涵盖单节点、分布式集群、安全加固及性能优化:
一、单节点部署
1. 安装准备
系统要求:
• 64位 Linux(推荐 CentOS 7+/Ubuntu 18.04+)
• 至少 4 核 CPU + 8GB 内存
• SSD 存储(推荐 NVMe)
- 安装步骤:
# Ubuntu/Debiansudo apt-get install -y apt-transport-https ca-certificates dirmngrsudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4echo \"deb https://packages.clickhouse.com/deb stable main\" | sudo tee /etc/apt/sources.list.d/clickhouse.listsudo apt-get updatesudo apt-get install -y clickhouse-server clickhouse-client# CentOS/RHELsudo yum install -y yum-utilssudo rpm --import https://repo.clickhouse.com/CLICKHOUSE-KEY.GPGsudo yum-config-manager --add-repo https://repo.clickhouse.com/rpm/clickhouse.reposudo yum install -y clickhouse-server clickhouse-client
2. 目录规划
sudo mkdir -p /data/clickhouse/{data,metadata,logs,conf}sudo chown -R clickhouse:clickhouse /data/clickhouse
配置文件迁移:
sudo mv /etc/clickhouse-server/config.xml /data/clickhouse/conf/sudo ln -s /data/clickhouse/conf/config.xml /etc/clickhouse-server/config.xml
3. 核心配置
config.xml 关键参数:
<yandex> <path>/data/clickhouse/data/</path> <tmp_path>/data/clickhouse/tmp/</tmp_path> <user_files_path>/data/clickhouse/user_files/</user_files_path> <format_schema_path>/data/clickhouse/format_schemas/</format_schema_path> <listen_host>0.0.0.0</listen_host> <http_port>8123</http_port> <tcp_port>9000</tcp_port> <max_memory_usage>10000000000</max_memory_usage> <max_concurrent_queries>100</max_concurrent_queries> <logger> <level>information</level> <log>/data/clickhouse/logs/clickhouse-server.log</log> <errorlog>/data/clickhouse/logs/clickhouse-server.err.log</errorlog> <size>1000M</size> <count>10</count> </logger></yandex>
4. 启动服务
sudo systemctl start clickhouse-serversudo systemctl enable clickhouse-server
二、集群部署方案
1. 集群拓扑设计
2. 分布式配置
- remote_servers 配置:
<remote_servers> <cluster_3s2r> <shard> <weight>1</weight> <internal_replication>true</internal_replication> <replica> <host>ch-node1</host> <port>9000</port> <user>admin</user> <password>SecurePass123!</password> </replica> <replica> <host>ch-node2</host> <port>9000</port> <user>admin</user> <password>SecurePass123!</password> </replica> </shard> </cluster_3s2r></remote_servers>
- ZooKeeper 集成:
<zookeeper> <node> <host>zk1.cluster</host> <port>2181</port> </node> <node> <host>zk2.cluster</host> <port>2181</port> </node> <node> <host>zk3.cluster</host> <port>2181</port> </node> <session_timeout_ms>30000</session_timeout_ms> <operation_timeout_ms>10000</operation_timeout_ms></zookeeper>
3. 表引擎选择
分布式表创建示例:
CREATE TABLE db1.metrics_local ON CLUSTER \'cluster_3s2r\'( event_time DateTime, metric_name String, value Float64)ENGINE = ReplicatedMergeTree(\'/clickhouse/tables/{shard}/metrics_local\', \'{replica}\')PARTITION BY toYYYYMM(event_time)ORDER BY (metric_name, event_time)TTL event_time + INTERVAL 30 DAY;CREATE TABLE db1.metrics_distributed ON CLUSTER \'cluster_3s2r\'AS db1.metrics_localENGINE = Distributed(\'cluster_3s2r\', \'db1\', \'metrics_local\', rand());
三、安全加固
1. 认证配置
users.xml 示例:
<users> <admin> <password>sha256:abcd1234...(使用openssl passwd生成)</password> <networks> <ip>::/0</ip> </networks> <profile>default</profile> <quota>default</quota> <access_management>1</access_management> </admin> <readonly> <password>sha256:def5678...</password> <networks> <ip>192.168.1.0/24</ip> </networks> <profile>readonly</profile> <quota>default</quota> </readonly></users>
2. SSL加密
配置步骤:
# 生成证书openssl req -x509 -newkey rsa:2048 -nodes -days 365 \\ -keyout /etc/clickhouse-server/server.key \\ -out /etc/clickhouse-server/server.crt# 修改配置<https_port>8443</https_port><openSSL> <server> <certificateFile>/etc/clickhouse-server/server.crt</certificateFile> <privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile> </server></openSSL>
四、性能优化
1. 核心参数调优
<merge_tree> <max_suspicious_broken_parts>5</max_suspicious_broken_parts> <parts_to_delay_insert>300</parts_to_delay_insert> <parts_to_throw_insert>600</parts_to_throw_insert></merge_tree><compression> <case> <method>zstd</method> <level>3</level> </case></compression>
2. 资源隔离
资源队列配置:
<profiles> <default> <max_threads>16</max_threads> <max_memory_usage_for_all_queries>100000000000</max_memory_usage_for_all_queries> </default> <batch> <max_threads>32</max_threads> <priority>10</priority> </batch></profiles>
五、监控与维护
1. Prometheus 集成
exporter 配置:
docker run -d -p 9116:9116 \\ -e CLICKHOUSE_USER=\"monitor\" \\ -e CLICKHOUSE_PASSWORD=\"MonitorPass123!\" \\ prom/clickhouse-exporter \\ -scrape_uri=http://ch-node1:8123/
2. 关键监控指标
六、备份与恢复
1. 冷备份方案
# 全量备份clickhouse-backup create full_backup_$(date +%Y%m%d)# 增量备份clickhouse-backup create incremental_backup_$(date +%Y%m%d) --diff-from=full_backup_20230801# 恢复数据clickhouse-backup restore full_backup_20230801
2. 跨集群同步
CREATE TABLE db1.metrics_restore AS db1.metrics_localENGINE = Distributed(\'backup_cluster\', \'db1\', \'metrics_local\', rand());INSERT INTO db1.metrics_restoreSELECT * FROM remote(\'backup_node\', db1.metrics_local);
七、最佳实践
- 分片键选择:优先选择高基数字段(如用户ID)
- 数据预热:启动后执行
SYSTEM DROP MARK CACHE
- 版本管理:使用
ALTER TABLE ... UPDATE
谨慎处理表结构变更 - 慢查询分析:开启
log_queries=1
并定期分析 query_log
通过以上步骤,可建立支持每秒百万级写入、亚秒级查询响应的 ClickHouse 生产集群,适用于实时分析、时序数据处理等场景。建议每月进行全链路压测,持续优化配置参数。
二、ClickHouse跨机房集群部署方案
1. 架构设计
目标:实现高可用、低延迟、数据地理冗余
拓扑结构:
• 3个机房(北京、上海、广州),每个机房部署完整ClickHouse分片
• 分片策略:每个分片包含3副本(跨机房部署)
• ZooKeeper集群:每个机房独立部署3节点ZooKeeper,组成跨机房集群
#mermaid-svg-JTVM8XP7o3UNyJrD {font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-JTVM8XP7o3UNyJrD .error-icon{fill:#552222;}#mermaid-svg-JTVM8XP7o3UNyJrD .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-JTVM8XP7o3UNyJrD .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-JTVM8XP7o3UNyJrD .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-JTVM8XP7o3UNyJrD .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-JTVM8XP7o3UNyJrD .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-JTVM8XP7o3UNyJrD .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-JTVM8XP7o3UNyJrD .marker{fill:#333333;stroke:#333333;}#mermaid-svg-JTVM8XP7o3UNyJrD .marker.cross{stroke:#333333;}#mermaid-svg-JTVM8XP7o3UNyJrD svg{font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-JTVM8XP7o3UNyJrD .label{font-family:\"trebuchet ms\",verdana,arial,sans-serif;color:#333;}#mermaid-svg-JTVM8XP7o3UNyJrD .cluster-label text{fill:#333;}#mermaid-svg-JTVM8XP7o3UNyJrD .cluster-label span{color:#333;}#mermaid-svg-JTVM8XP7o3UNyJrD .label text,#mermaid-svg-JTVM8XP7o3UNyJrD span{fill:#333;color:#333;}#mermaid-svg-JTVM8XP7o3UNyJrD .node rect,#mermaid-svg-JTVM8XP7o3UNyJrD .node circle,#mermaid-svg-JTVM8XP7o3UNyJrD .node ellipse,#mermaid-svg-JTVM8XP7o3UNyJrD .node polygon,#mermaid-svg-JTVM8XP7o3UNyJrD .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-JTVM8XP7o3UNyJrD .node .label{text-align:center;}#mermaid-svg-JTVM8XP7o3UNyJrD .node.clickable{cursor:pointer;}#mermaid-svg-JTVM8XP7o3UNyJrD .arrowheadPath{fill:#333333;}#mermaid-svg-JTVM8XP7o3UNyJrD .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-JTVM8XP7o3UNyJrD .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-JTVM8XP7o3UNyJrD .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-JTVM8XP7o3UNyJrD .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-JTVM8XP7o3UNyJrD .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-JTVM8XP7o3UNyJrD .cluster text{fill:#333;}#mermaid-svg-JTVM8XP7o3UNyJrD .cluster span{color:#333;}#mermaid-svg-JTVM8XP7o3UNyJrD div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-JTVM8XP7o3UNyJrD :root{--mermaid-font-family:\"trebuchet ms\",verdana,arial,sans-serif;} 广州机房 上海机房 北京机房 分片1-副本 分片2-副本 分片3-主 ZooKeeper ZooKeeper ZooKeeper 分片1-副本 分片2-主 分片3-副本 ZooKeeper ZooKeeper ZooKeeper 分片1-主 分片2-副本 分片3-副本 ZooKeeper ZooKeeper ZooKeeper
2. 网络优化配置
关键参数:
<yandex> <interserver_http_port>9009</interserver_http_port> <interserver_http_host>bj01-node1</interserver_http_host> <listen_host>0.0.0.0</listen_host> <remote_servers> <cluster1> <shard> <weight>1</weight> <internal_replication>true</internal_replication> <replica> <host>bj01-node1</host> <port>9000</port> <interserver_http_port>9009</interserver_http_port> <zone>bj</zone> </replica> <replica> <host>sh01-node1</host> <port>9000</port> <interserver_http_port>9009</interserver_http_port> <zone>sh</zone> </replica> </shard> </cluster1> </remote_servers> <network> <compression>true</compression> <send_timeout>300</send_timeout> <receive_timeout>300</receive_timeout> <keep_alive_timeout>600</keep_alive_timeout> </network></yandex>
3. 数据同步策略
三级数据同步机制:
副本配置示例:
CREATE TABLE metrics( event_date Date, metric_id UInt32, value Float64)ENGINE = ReplicatedMergeTree(\'/clickhouse/tables/{shard}/metrics\', \'{replica}\')PARTITION BY toYYYYMM(event_date)ORDER BY (metric_id, event_date)SETTINGS max_replicated_mutations_in_queue=1000, replicated_can_become_leader=1, replicated_max_parallel_fetches=16;
4. 容灾切换方案
故障检测矩阵:
切换命令示例:
# 强制副本切换echo \"SYSTEM RESTART REPLICA metrics\" | clickhouse-client -h standby-node# 机房级切换(VIP转移)keepalivedctl switchover --new-master sh02-node1
5. 性能优化参数
核心配置调整:
<yandex> <profiles> <default> <max_memory_usage>10000000000</max_memory_usage> <max_execution_time>300</max_execution_time> <distributed_product_mode>local</distributed_product_mode> <prefer_localhost_replica>0</prefer_localhost_replica> <use_hedged_requests>1</use_hedged_requests> <async_socket_for_remote>1</async_socket_for_remote> <load_balancing>first_or_random</load_balancing> <priority> <bj>3</bj> <sh>2</sh> <gz>1</gz> </priority> </default> </profiles></yandex>
6. 监控指标体系
关键监控项:
Prometheus配置示例:
scrape_configs: - job_name: \'clickhouse\' static_configs: - targets: - bj01-node1:9363 - sh01-node1:9363 - gz01-node1:9363 metrics_path: \'/metrics\' - job_name: \'zk\' static_configs: - targets: - zk-bj1:7000 - zk-sh1:7000 - zk-gz1:7000
7. 运维管理规范
变更管理流程:
-
灰度发布:新配置先在1个机房生效,观察24小时
-
滚动升级:分3个批次进行,间隔2小时
-
数据迁移:使用
ALTER TABLE MOVE PARTITION
命令 -
备份策略:
• 每日增量备份(保留7天)• 每周全量备份(保留4周)
• 异地冷备(保留1年)
自动化脚本示例:
#!/bin/bash# 跨机房数据均衡脚本for shard in {1..3}do clickhouse-client --query \" SYSTEM SYNC REPLICA metrics_shard${shard} OPTIMIZE TABLE metrics_shard${shard} FINAL ALTER TABLE metrics_shard${shard} MOVE PARTITION \'202308\' TO SHARD \'/bj0${shard}\' \"done
8. 成本优化建议
资源分配策略:
存储分层配置:
ALTER TABLE metrics MODIFY TTL event_date + INTERVAL 3 MONTH TO VOLUME \'hot\', event_date + INTERVAL 6 MONTH TO VOLUME \'warm\', event_date + INTERVAL 12 MONTH TO VOLUME \'cold\'
该方案通过多层次冗余、智能路由和分级存储,实现RTO<5分钟、RPO<10秒的容灾能力,同时跨机房查询延迟控制在150ms以内。建议根据实际业务流量进行压力测试,优化分片数量和副本分布策略。
关联知识
【大数据知识】ClickHouse入门
【运维工具】Ansible一款好用的自动化工具
【数据库知识】行式存储与列式存储
【ClickHouse官网文档地址】