rclone工具的特点

  • 支持增量,配置简单,支持参数调节吞吐量(不同吞吐量使用内存不同,传输差异也不同)
  • copy是复制 source 到 dst
  • sync是根据 src 的内容对比 dst,删除dst不存在的内容

下面是写了一同步的脚本

bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
#!/bin/bash

# Function to perform full sync (all files) for a specific bucket
sync_full() {
    local bucket_name=$1
    local batch_id=$2
    local differ_log="/tmp/${bucket_name}_differ_${batch_id}.log"
    local combined_log="/tmp/${bucket_name}_${batch_id}.log"
    local missing_dst="/tmp/${bucket_name}_md_${batch_id}.log"

    echo "Performing full sync from $bucket_name (ceph-1 to ceph-220)..."
    rclone --config=/root/.config/rclone/rclone.conf sync --fast-list --combined="$combined_log" --differ="$differ_log" --missing-on-dst="$missing_dst" "ceph-1:$bucket_name" "ceph-220:$bucket_name" --multi-thread-streams=20 
    if [ $? -eq 0 ]; then
        echo "$(date '+%Y-%m-%d %H:%M:%S') - Full sync for $bucket_name successful." >> "$log_file"
    else
        echo "$(date '+%Y-%m-%d %H:%M:%S') - Full sync for $bucket_name failed." >> "$log_file"
    fi
}

# Function to perform partial sync based on --max-age for a specific bucket
sync_partial() {
    local bucket_name=$1
    local max_age=$2
    local batch_id=$3
    local differ_log="/tmp/${bucket_name}_differ_${batch_id}.log"
    local combined_log="/tmp/${bucket_name}_${batch_id}.log"
    local missing_dst="/tmp/${bucket_name}_md_${batch_id}.log"
    
    echo "Performing partial sync from $bucket_name (ceph-1 to ceph-220) with --max-age=$max_age..."
    rclone --config=/root/.config/rclone/rclone.conf sync --fast-list --combined="$combined_log" --differ="$differ_log" --missing-on-dst="$missing_dst" "ceph-1:$bucket_name" "ceph-220:$bucket_name" --max-age "$max_age" --multi-thread-streams=20 
    if [ $? -eq 0 ]; then
        echo "$(date '+%Y-%m-%d %H:%M:%S') - Partial sync for $bucket_name successful." >> "$log_file"
    else
        echo "$(date '+%Y-%m-%d %H:%M:%S') - Partial sync for $bucket_name failed." >> "$log_file"
    fi
}

# Main function to perform sync based on input parameters
main() {
    local bucket_name=$1
    local sync_type=$2
    local max_age=$3
    local batch_id=$(date '+%Y%m%d%H%M%S')  # Unique batch ID based on timestamp

    # Start time
    local start_time=$(date '+%Y-%m-%d %H:%M:%S')
    local start_timestamp=$(date +%s)

    # Log file path
    local log_file="/tmp/rclone_sync_${batch_id}.log"

    echo "Script started at: $start_time" >> "$log_file"

    # Perform sync based on type
    case "$sync_type" in
        full)
            sync_full "$bucket_name" "$batch_id" >> "$log_file" 2>&1
            ;;
        partial)
            if [ -z "$max_age" ]; then
                echo "Error: max_age is required for partial sync." >> "$log_file"
                exit 1
            fi
            sync_partial "$bucket_name" "$max_age" "$batch_id" >> "$log_file" 2>&1
            ;;
        *)
            echo "Error: Invalid sync type. Choose 'full' or 'partial'." >> "$log_file"
            echo "Usage: $0 <bucket_name> <sync_type> [max_age]" >> "$log_file"
            exit 1
            ;;
    esac

    # End time
    local end_time=$(date '+%Y-%m-%d %H:%M:%S')
    local end_timestamp=$(date +%s)

    # Calculate duration
    local duration=$((end_timestamp - start_timestamp))

    echo "Script finished at: $end_time" >> "$log_file"
    echo "Total execution time: ${duration}s" >> "$log_file"
    echo "Log file saved at: $log_file"
}

# Check arguments count
if [ "$#" -lt 2 ]; then
    echo "Usage: $0 <bucket_name> <sync_type> [max_age]"
    echo "<sync_type>: full or partial"
    echo "[max_age]: Required for partial sync (e.g., 24h, 7d)"
    exit 1
fi

main "$@"

本次传输吞吐量测试

传输环境的数据类型

文件类型文件数量Bucket大小传输用时
小文件和大文件结合32633858.263 GiB首次传输 25分钟 (1538s)
--size-only 可以把上述缩减到 26s

执行输出

bash
1
2
3
4
5
6
7
8
Script started at: 2024-09-24 08:35:04
Performing full sync from hk-im (ceph-1 to ceph-2)...
2024-09-24 09:00:41 - Full sync for hk-im successful.
Script finished at: 2024-09-24 09:00:42
Total execution time: 1538s

$ wc -l hk-im_20240924083504.log
326338 hk-im_20240924083504.log

image-20241129232645456

图:集群在传输时网络流量

查看 bucket 文件总大小和数量

bash
1
2
3
$ rclone size ceph-1:hk-im
Total objects: 326.338k (326338)
Total size: 58.263 GiB (62559469310 Byte)

传输时CPU使用情况

bash
1
2
3
4
5
6
$ ps aux --sort=-%mem|head -10
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     24405  150  2.9 1696216 239824 pts/0  Sl   10:51   6:00 rclone --config=/root/.config/rclone/rclone.conf sync --fast-list --combined=/tmp/hk-im_20240924105109.log --differ=/tmp/hk-im_differ_20240924105109.log --missing-on-dst=/tmp/hk-im_md_20240924105109.log ceph-1:hk-im ceph-2:hk-im --multi-thread-streams=200 --multi-thread-chunk-size=512Mi
root      1031  0.0  0.2 574296 23704 ?        Ssl  Sep23   0:16 /usr/bin/python2 -Es /usr/sbin/tuned -l -P
polkitd    740  0.0  0.2 612376 18040 ?        Ssl  Sep23   0:02 /usr/lib/polkit-1/polkitd --no-debug
root       538  0.0  0.2  47652 17360 ?        Ss   Sep23   0:15 /usr/lib/systemd/systemd-journald

使用 --size-only 吞吐量会大很多,但是内存会使用多一些,

bash
1
2
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     25387  137  5.6 1694040 464452 pts/0  Sl   11:00   0:30 rclone --config=/root/.config/rclone/rclone.conf sync --size-only --fast-list --combined=/tmp/hk-im_20240924110024.log --di

再不使用 --size-only 时,IOPS会高一些,使用 –size-only时应该对比条件减少,增加了速度

image-20241129233055699

图:CEPH cluster dst

image-20241129233220994

图:CEPH cluster src

Reference