最近,我不得不将 Zookeeper 3.4.18 集群升级到 3.6+。要求是:无感升级,不丢失数据,并且尽量不向任何用户发出通知。在调研zookeeper 版本后,发现 3.6+ 支持了 metrics 模块,比较符合需求,所以需要从 3.4.18 升级至 3.6.4

  • 3.5 + 支持动态配置
  • 3.6.0+ 支持内置 metrics 模块

现有集群配置

集群IP当前目录新版本目录
192.240.16.18/usr/local/zookeeper-3.4.14//usr/local/apache-zookeeper-3.6.4-bin/
192.240.16.21/usr/local/zookeeper-3.4.14//usr/local/apache-zookeeper-3.6.4-bin/
192.240.16.28/usr/local/zookeeper-3.4.14//usr/local/apache-zookeeper-3.6.4-bin/
192.240.16.147/usr/local/zookeeper-3.4.14//usr/local/apache-zookeeper-3.6.4-bin/
192.240.16.202/usr/local/zookeeper-3.4.14//usr/local/apache-zookeeper-3.6.4-bin/

下载安装包

在官方 archive 找到对应安装包

从 zk 3.5 起安装包分为带 “bin” 和不带 “bin” 的

  • 带 “bin” 的包含所需jar包
  • 不带 “bin” 的需要自行编译
bash
1
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.6.4/

解压

bash
1
tar Czxf /usr/local/ apache-zookeeper-3.6.4-bin.tar.gz && cd /usr/local

升级版本

注意以下步骤需要对每个 zk 服务器都执行一边

检查状态

检查每个 zk 实例的角色,注意,Leader 要留着最后升级

bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
$ echo status | nc localhost 2181

Zookeeper version: 3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
Clients:
 /10.222.16.147:58936[1](queued=0,recved=24290,sent=24434)
 /10.222.16.18:56736[1](queued=0,recved=24213,sent=24363)
 /127.0.0.1:60822[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/77
Received: 48505
Sent: 48798
Connections: 3
Outstanding: 0
Zxid: 0x5c000027ee
Mode: leader
Node count: 440526
Proposal sizes last/min/max: 139/32/1039

准备配置文件

产生默认配置文件

bash
1
cp apache-zookeeper-3.6.4-bin/conf/zoo_sample.cfg apache-zookeeper-3.6.4-bin/conf/zoo.cfg

启用 prometheus metrics 相关配置

bash
1
2
3
sed -i "s@#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider@metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider@g" apache-zookeeper-3.6.4-bin/conf/zoo.cfg
sed -i "s@#metricsProvider.httpPort=7000@metricsProvider.httpPort=7000@g" apache-zookeeper-3.6.4-bin/conf/zoo.cfg
sed -i "s@#metricsProvider.exportJvmInfo=true@metricsProvider.exportJvmInfo=true@g" apache-zookeeper-3.6.4-bin/conf/zoo.cfg

检查配置

bash
1
tail -5 apache-zookeeper-3.6.4-bin/conf/zoo.cfg

增加旧的配置

bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# 增加旧的配置
cat <<EOF >> apache-zookeeper-3.6.4-bin/conf/zoo.cfg
dataDir=/usr/local/zookeeper/data
dataLogDir=/usr/local/zookeeper/logs
# the port at which the clients will connect
clientPort=2181
server.1=10.240.16.18:2888:3888
server.2=10.240.16.21:2888:3888
server.3=10.240.16.28:2888:3888
server.4=10.240.16.147:2888:3888
server.5=10.240.16.202:2888:3888
autopurge.snapRetainCount=3
autopurge.purgeInterval=24
EOF

检查配置

bash
1
tail -13 apache-zookeeper-3.6.4-bin/conf/zoo.cfg

直接转移数据文件

bash
1
2
3
ln -svnf /usr/local/zookeeper-3.4.14/data /usr/local/apache-zookeeper-3.6.4-bin/data
ln -svnf /usr/local/zookeeper-3.4.14/logs /usr/local/apache-zookeeper-3.6.4-bin/logs
ls -l /usr/local/apache-zookeeper-3.6.4-bin/

修改新版启动脚本

此步骤是为了可以使用 stat 命令

bash
1
2
sed -i '77a\ZOOMAIN="-Dzookeeper.4lw.commands.whitelist=* ${ZOOMAIN}"' apache-zookeeper-3.6.4-bin/bin/zkServer.sh
tail -n +70 apache-zookeeper-3.6.4-bin/bin/zkServer.sh | head -n 9

停止旧版本

bash
1
zookeeper/bin/zkServer.sh stop

连接到新版本

bash
1
ln -svnf apache-zookeeper-3.6.4-bin zookeeper

启动服务

bash
1
zookeeper/bin/zkServer.sh start

验证服务

bash
1
echo status | nc localhost 2181

检查内容

bash
1
2
zookeeper/bin/zkCli.sh
ls /clickhouse/tables/01-01

恢复旧版本

bash
1
ln -svnf zookeeper-3.4.14 zookeeper

Reference

[1] zookeeper启动报错:Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain

[2] zookeeper 异常 :stat is not executed because it is not in the whitelist. Connection closed b

[3] ZooKeeper Monitor Guide

[4] KIP-902: Upgrade Zookeeper to 3.8.2