• CEPH-1:ceph-deploy离线部署ceph集群及报错解决FAQ


    ceph-deploy部署ceph集群

    环境介绍

    主机名 ip地址 操作系统 角色 备注
    ceph-node1 10.153.204.13 Centos7.6 mon、osd、mds、mgr、rgw、ceph-deploy chronyd时钟同步(主)
    ceph-node2 10.130.22.45 Centos7.6 mon、osd、mds、mgr、rgw chronyd时钟同步
    ceph-node3 10.153.204.28 Centos7.3 mon、osd chronyd时钟同步

    此环境共三台机器,操作前ntp需要同步,node1为ceph-deploy部署节点,每台机器三块分区用作osd磁盘。

    ceph组件介绍

    名称 作用
    osd 全称Object Storage Device,主要功能是存储数据、复制数据、平衡数据、恢复数据等。每个OSD间会进行心跳检查,并将一些变化情况上报给Ceph Monitor。
    mon 全称Monitor,负责监视Ceph集群,维护Ceph集群的健康状态,同时维护着Ceph集群中的各种Map图,比如OSD Map、Monitor Map、PG Map和CRUSH Map,这些Map统称为Cluster Map,根据Map图和object id等计算出数据最终存储的位置。
    mgr 全称Manager,负责跟踪运行时指标和Ceph集群的当前状态,包括存储利用率,当前性能指标和系统负载。
    mds 全称是MetaData Server,主要保存的文件系统服务的元数据,如果使用cephfs功能才会启用它,对象存储和块存储设备是不需要使用该服务。
    rgw 全称radosgw,是一套基于当前流行的RESTFUL协议的网关,ceph对象存储的入口,不启用对象存储,则不需要安装。

    每个组件都需要保证高可用性:
    1.osd服务越多,在相同副本的情况下高可用性就越强。
    2.mon一般部署三个,保证高可用。
    3.mgr一般部署两个,保证高可用。
    4.mds一般部署两套保证高可用,每套都为主从。
    5.rgw一般部署两个,保证高可用。

    ceph版本介绍

    第一个 Ceph 版本是 0.1 ,要回溯到 2008 年 1 月。多年来,版本号方案一直没变,直到 2015 年 4 月 0.94.1 ( Hammer 的第一个修正版)发布后,为了避免 0.99 (以及 0.100 或 1.00 ),制定了新策略:

    • x.0.z - 开发版(给早期测试者和勇士们)
    • x.1.z - 候选版(用于测试集群、高手们)
    • x.2.z - 稳定、修正版(给用户们)

    这里使用的 ceph version 15.2.9,ceph-deploy 2.0.1

    ceph安装前准备工作

    1.升级系统内核到4系或以上

    我这里升级到了4.17,升级步骤此处省略。

    2.firewalld、iptables、SElinux关闭

    ## 防火墙
    systemctl stop firewalld.service 
    systemctl disable firewalld.service
    
    ## selinux
    setenforce
    sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
    

    3.chronyd时间同步

    这里以node1为时钟服务端,其他节点为时钟客户端

    [master下操作]
    vim /etc/chrony.conf
    ...
    ## 主要下面几个点
    server 10.153.204.13 iburst #指定服务端
    allow 10.0.0.0/8 #把自身当作服务端
    ...
    
    [slave下操作]
    vim /etc/chrony.conf
    ...
    server 10.153.204.13 iburst #指定服务端
    ...
    
    ## 然后重启服务,查看状态
    systemctl enable chronyd
    systemctl restart chronyd
    timedatectl
    chronyc sources -v
    

    4.在ceph-deploy节点写临时hosts文件

    # cat /etc/hosts
    10.153.204.13  ceph-node1
    10.130.22.45 ceph-node2
    10.153.204.28 ceph-node3
    
    

    5.创建普通用户,赋予sudo权限,并将ceph-deploy节点对其他节点做免密操作

    ## 利用ansible给所有机器创建 cephadmin 用户
    ansible all -m shell -a 'groupadd -r -g 2022 cephadmin && useradd -r -m -s /bin/bash -u 2022 -g 2022 cephadmin && echo cephadmin:123456 | chpasswd'
    
    ## 赋予sudo权限,并不需要密码
    ansible node -m shell -a 'echo "cephadmin    ALL=(ALL)    NOPASSWD:ALL" >> /etc/sudoers'
    
    ## 做免密
    su - cephadmin
    ssh-keygen 
    ssh-copy-id ceph-node2
    ssh-copy-id ceph-node3
    

    6.将osd磁盘准备好,最好一块磁盘一个osd,此环境资源紧张,我这里一个分区一个osd

    [root@ceph-node1 ~]$ lsblk 
    NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
    nvme0n1     259:0    0 931.5G  0 disk 
    ├─nvme0n1p5 259:7    0   100G  0 part 
    ├─nvme0n1p3 259:5    0   100G  0 part 
    ├─nvme0n1p6 259:8    0   100G  0 part 
    ├─nvme0n1p4 259:6    0   100G  0 part 
    

    所有osd机器磁盘分布相同,仅分区就好,先不要创建lvm、格式化等。

    7.如果是内网机器,需要自己构建本地ceph yum源

    (1)找个外网机器,执行此脚本,可以根据自己需要更改版本信息及url地址

    #!/usr/bin/env bash
    
    URL_REPO=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-15.2.9/el7/x86_64/
    URL_REPODATA=https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-15.2.9/el7/x86_64/repodata/
    
    function get_repo()
    {
    test -d ceph_repo || mkdir ceph_repo
    cd ceph_repo
    
    for i in `curl $URL_REPO | awk -F '"' '{print $4}' | grep rpm`;do
        curl -O $URL_REPO/$i
    done
    }
    
    function get_repodata()
    {
    test -d ceph_repo/repodata || mkdir ceph_repo/repodata
    cd ceph_repo/repodata
    
    for i in `curl $URL_REPODATA | awk -F '"' '{print $4}' | grep xml`;do
        curl -O $URL_REPODATA/$i
    done
    }
    
    if [ $1 == 'repo' ];then 
        get_repo()
    elif [ $1 == 'repodata' ];then
        get_repodata()
    elif [ $1 == 'all' ];then
        get_repo()
        get_repodata()
    else
        echo '请输入其中一个参数[ repo | repodata | all ]'
    fi
    

    (2)上传至内网服务器,安装 配置 nginx

    yum -y install nginx
    
    ## 主要修改以下字段,/home/ceph_repo 替换为你的真实目录。
    vim /etc/nginx/nginx.conf
        server {
            listen       8080;
            listen       [::]:8080;
            server_name  _;
            root         /home/ceph;
    
            # Load configuration files for the default server block.
            include /etc/nginx/default.d/*.conf;
    
            location / {
               autoindex on;
            }
    
        }
    
    systemctl start nginx 
    

    (3)配置yum源--每个节点都要配置

    cat > /etc/yum.repos.d/ceph-http.repo << EOF
    [local-ceph]
    name=local-ceph
    baseurl=http://ceph-node1:8080/ceph_repo
    gpgcheck=0
    enable=1
    [noarch-ceph]
    name=local-ceph
    baseurl=http://ceph-node1:8080/noarch_repo
    gpgcheck=0
    enable=1
    EOF
    

    然后

    yum makecache
    
    ## 检查是否生效
    yum list | grep ceph 
    

    ceph-deploy部署

    1.查看目前ceph-deploy版本

    # yum list ceph-deploy --showduplicates
    Loaded plugins: fastestmirror, langpacks, priorities
    Loading mirror speeds from cached hostfile
    Available Packages
    ceph-deploy.noarch                                     1.5.25-1.el7                                     epel       
    ceph-deploy.noarch                                     1.5.29-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.30-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.31-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.32-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.33-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.34-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.35-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.36-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.37-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.38-0                                         noarch-ceph
    ceph-deploy.noarch                                     1.5.39-0                                         noarch-ceph
    ceph-deploy.noarch                                     2.0.0-0                                          noarch-ceph
    ceph-deploy.noarch                                     2.0.1-0                                          noarch-ceph
    

    这里第一次装的1.5.38版本,但初始化osd时会报错,最终使用的是2.0.1。阿里云或清华云去找新版本:https://mirrors.tuna.tsinghua.edu.cn/ceph/rpm-15.2.9/el7/noarch/https://mirrors.aliyun.com/ceph

    2.安装ceph-deploy

    ## ceph需要的python环境依赖,一并装上
    yum -y install ceph-common python-pkg-resources python-setuptools python2-subprocess32
    
    ## 装deploy
    yum -y install ceph-deploy-2.0.1
    
    ## 安装完毕后,可以查看帮助命令
    ceph-deploy --help
    

    如果报错ImportError: No module named pkg_resources,装上python-setuptools包就好了。

    ceph集群初始化,部署

    1.初始化 mon 服务器(先初始化一台,后边再add其他节点)

    ## 初始化之前,最好提前在每个 mon 节点都将mon的包安装好,在之后的安装中程序会自动安装,提前装好是为了提前发现问题
    yum -y install ceph-mon 
    

    (1)开始初始化配置文件,指定公网和私网的网段,生成ceph.conf配置文件

    $ ceph-deploy new --cluster-network 10.0.0.0/8 --public-network 10.0.0.0/8 ceph-node1
    
    [ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadmin/.cephdeploy.conf
    [ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy new --cluster-network 10.0.0.0/8 --public-network 10.0.0.0/8 ceph-node1
    [ceph_deploy.new][DEBUG ] Creating new cluster named ceph
    [ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
    [ceph_deploy][ERROR ] Traceback (most recent call last):
    [ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/util/decorators.py", line 69, in newfunc
    [ceph_deploy][ERROR ]     return f(*a, **kw)
    [ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/cli.py", line 162, in _main
    [ceph_deploy][ERROR ]     return args.func(args)
    [ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/new.py", line 141, in new
    [ceph_deploy][ERROR ]     ssh_copy_keys(host, args.username)
    [ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/new.py", line 35, in ssh_copy_keys
    [ceph_deploy][ERROR ]     if ssh.can_connect_passwordless(hostname):
    [ceph_deploy][ERROR ]   File "/usr/lib/python2.7/site-packages/ceph_deploy/util/ssh.py", line 15, in can_connect_passwordless
    [ceph_deploy][ERROR ]     if not remoto.connection.needs_ssh(hostname):
    [ceph_deploy][ERROR ] AttributeError: 'module' object has no attribute 'needs_ssh'
    [ceph_deploy][ERROR ] 
    

    这个问题与ceph-deploy版本有关,指令添加参数“--no-ssh-copykey”即可:

    $ ceph-deploy new --cluster-network 10.0.0.0/8 --public-network 10.0.0.0/8 ceph-node1 --no-ssh-copykey
    
    [ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadmin/.cephdeploy.conf
    [ceph_deploy.cli][INFO  ] Invoked (1.5.38): /bin/ceph-deploy new --cluster-network 10.0.0.0/8 --public-network 10.0.0.0/8 ceph-node1 --no-ssh-copykey
    [ceph_deploy.new][DEBUG ] Creating new cluster named ceph
    [ceph-node1][DEBUG ] connection detected need for sudo
    [ceph-node1][DEBUG ] connected to host: ceph-node1 
    [ceph-node1][DEBUG ] detect platform information from remote host
    [ceph-node1][DEBUG ] detect machine type
    [ceph-node1][DEBUG ] find the location of an executable
    [ceph-node1][INFO  ] Running command: sudo /usr/sbin/ip link show
    [ceph-node1][INFO  ] Running command: sudo /usr/sbin/ip addr show
    [ceph-node1][DEBUG ] IP addresses found: [u'192.168.42.1', u'10.153.204.13', u'10.233.64.0', u'10.233.64.1', u'169.254.25.10']
    [ceph_deploy.new][DEBUG ] Resolving host ceph-node1
    [ceph_deploy.new][DEBUG ] Monitor ceph-node1 at 10.153.204.13
    [ceph_deploy.new][DEBUG ] Monitor initial members are ['ceph-node1']
    [ceph_deploy.new][DEBUG ] Monitor addrs are [u'10.153.204.13']
    [ceph_deploy.new][DEBUG ] Creating a random mon key...
    [ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
    [ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
    

    如果ceph-deploy的版本为1.5.25左右的话,最佳解决办法是将ceph-deploy程序升级到2.0.1;升级后重新执行。

    (2)开始初始化 mon 节点

    ceph-deploy mon create-initial
    

    报错:

    [ceph-node1][INFO  ] Running command: sudo ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-node1.asok mon_status
    [ceph-node1][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
    

    这个应该是因为我之前部署过,删除时没有将环境删除干净,彻底再删除一下,然后再执行:

    ## 删除
    rm -rf /etc/ceph/* /var/lib/ceph/* /var/log/ceph/* /var/run/ceph/*
    

    再次执行,成功:

    [ceph-node1][INFO  ] Running command: sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-ceph-node1/keyring auth get client.bootstrap-rgw
    [ceph_deploy.gatherkeys][INFO  ] Storing ceph.client.admin.keyring
    [ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-mds.keyring
    [ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-mgr.keyring
    [ceph_deploy.gatherkeys][INFO  ] keyring 'ceph.mon.keyring' already exists
    [ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-osd.keyring
    [ceph_deploy.gatherkeys][INFO  ] Storing ceph.bootstrap-rgw.keyring
    [ceph_deploy.gatherkeys][INFO  ] Destroy temp directory /tmp/tmps6CzLR
    

    验证mon是否启动成功

    # ps -ef | grep ceph-mon 
    ceph     23737     1  0 16:22 ?        00:00:00 /usr/bin/ceph-mon -f --cluster ceph --id ceph-node1 --setuser ceph --setgroup ceph
    

    mon 初始化完成.

    mon初始化完毕后,就可以查看ceph集群的状态,可以多设置几个管理端

    将集群配置文件 以及 admin用户的key传送至目标机器/etc/ceph/,即可操作ceph集群:

    ceph-deploy admin ceph-node1 ceph-node2 ceph-node2
    
    $ ll -h /etc/ceph/
    total 8.0K
    -rw------- 1 root root 151 Feb 12 16:35 ceph.client.admin.keyring
    -rw-r--r-- 1 root root 265 Feb 12 16:35 ceph.conf
    -rw------- 1 root root   0 Feb 12 16:22 tmppE21x5
    
    ## 查看集群状态
    $ sudo ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_OK
     
      services:
        mon: 1 daemons, quorum ceph-node1 (age 14m)
        mgr: no daemons active
        osd: 0 osds: 0 up, 0 in
     
      data:
        pools:   0 pools, 0 pgs
        objects: 0 objects, 0 B
        usage:   0 B used, 0 B / 0 B avail
        pgs:     
    

    现在只有一个mon

    2.添加mgr服务

    (1)安装mgr包,每个mgr节点都装

    yum -y install ceph-mgr 
    

    (2)添加mgr至集群

    $ ceph-deploy mgr create ceph-node1
    [ceph-node1][INFO  ] Running command: sudo ceph --cluster ceph --name client.bootstrap-mgr --keyring /var/lib/ceph/bootstrap-mgr/ceph.keyring auth get-or-create mgr.ceph-node1 mon allow profile mgr osd allow * mds allow * -o /var/lib/ceph/mgr/ceph-ceph-node1/keyring
    [ceph-node1][INFO  ] Running command: sudo systemctl enable ceph-mgr@ceph-node1
    [ceph-node1][WARNIN] Created symlink from /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@ceph-node1.service to /usr/lib/systemd/system/ceph-mgr@.service.
    [ceph-node1][INFO  ] Running command: sudo systemctl start ceph-mgr@ceph-node1
    [ceph-node1][INFO  ] Running command: sudo systemctl enable ceph.target
    
    ## 再次查看ceph集群状态
    # ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_WARN
                Module 'restful' has failed dependency: No module named 'pecan'
                OSD count 0 < osd_pool_default_size 3
     
      services:
        mon: 1 daemons, quorum ceph-node1 (age 46m)
        mgr: ceph-node1(active, since 100s)
        osd: 0 osds: 0 up, 0 in
     
      data:
        pools:   0 pools, 0 pgs
        objects: 0 objects, 0 B
        usage:   0 B used, 0 B / 0 B avail
        pgs:     
    

    这里出现了三个告警:

    • Module 'restful' has failed dependency: No module named 'pecan'

    • Module 'restful' has failed dependency: No module named 'werkzeug'
      这个是mgr机器缺少pecan和werkzeug模块,可以找外网机器使用pip3下载好离线包和离线包后传上来再安装:https://pypi.tuna.tsinghua.edu.cn/simple/;https://pypi.org/simple。

    • OSD count 0 < osd_pool_default_size 3:
      osd中每个对象默认的副本数为3,此报警提示osd数量小于三个,这个可以暂时忽略。

    3.初始化osd

    ## 先查看目标主机可用的磁盘
    $ ceph-deploy disk list ceph-node1
    报错:
    [ceph_deploy][ERROR ] ExecutableNotFound: Could not locate executable 'ceph-disk' make sure it is installed and available on ceph-node1
    

    后来查看官网https://docs.ceph.com/en/pacific/ceph-volume/发现,在Ceph version 13.0.0时,ceph-disk已经被弃用,改用ceph-volume,查看所有命令确实没有ceph-disk只有ceph-volume

    # locate ceph- |grep bin
    /usr/bin/ceph-authtool
    /usr/bin/ceph-bluestore-tool
    /usr/bin/ceph-clsinfo
    /usr/bin/ceph-conf
    /usr/bin/ceph-crash
    /usr/bin/ceph-dencoder
    /usr/bin/ceph-deploy
    /usr/bin/ceph-kvstore-tool
    /usr/bin/ceph-mds
    /usr/bin/ceph-mgr
    /usr/bin/ceph-mon
    /usr/bin/ceph-monstore-tool
    /usr/bin/ceph-objectstore-tool
    /usr/bin/ceph-osd
    /usr/bin/ceph-osdomap-tool
    /usr/bin/ceph-post-file
    /usr/bin/ceph-rbdnamer
    /usr/bin/ceph-run
    /usr/bin/ceph-syn
    /usr/sbin/ceph-create-keys
    /usr/sbin/ceph-volume
    /usr/sbin/ceph-volume-systemd
    

    这样看来,应该是我ceph-deploy版本和要部署的ceph版本不匹配,更换ceph-deploy版本为2.0.1.

    $ ceph-deploy --version 
    2.0.1
    
    $ ceph-deploy disk list ceph-node1
    [ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadmin/.cephdeploy.conf
    [ceph-node1][DEBUG ] connection detected need for sudo
    [ceph-node1][DEBUG ] connected to host: ceph-node1 
    [ceph-node1][DEBUG ] detect platform information from remote host
    [ceph-node1][DEBUG ] detect machine type
    [ceph-node1][DEBUG ] find the location of an executable
    [ceph-node1][INFO  ] Running command: sudo fdisk -l
    [ceph-node1][INFO  ] Disk /dev/nvme1n1: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
    [ceph-node1][INFO  ] Disk /dev/nvme0n1: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
    [ceph-node1][INFO  ] Disk /dev/mapper/data-ceph--data1: 107.4 GB, 107374182400 bytes, 209715200 sectors
    [ceph-node1][INFO  ] Disk /dev/mapper/data-ceph--data2: 107.4 GB, 107374182400 bytes, 209715200 sectors
    [ceph-node1][INFO  ] Disk /dev/mapper/data-ceph--data3: 107.4 GB, 107374182400 bytes, 209715200 sectors
    [ceph-node1][INFO  ] Disk /dev/mapper/data-ceph--data4: 107.4 GB, 107374182400 bytes, 209715200 sectors
    

    初始化node节点:

    $ 初始化node节点,其实就是安装ceph、ceph-radosgw和一些相关基础组件
    $ ceph-deploy install --no-adjust-repos --nogpgcheck ceph-node1 ceph-node2 ceph-node3
        - --no-adjust-repos表示不将本机的repo文件传输至目标机器,因为前边已经手动配置了
        - --nogpgcheck不检查yum的key
    

    安装osd服务:

    ## 在需要安装osd的机器中执行
    yum -y install ceph-osd ceph-common
    

    擦除所有node节点要初始化为osd的盘的数据:

    ## 举例,其他盘也要相同的操作
    $ ceph-deploy disk zap ceph-node1 /dev/nvme0n1p3 /dev/nvme0n1p4 /dev/nvme0n1p5 /dev/nvme0n1p6
    [ceph-node1][WARNIN] --> Zapping: /dev/nvme0n1p3
    [ceph-node1][WARNIN] Running command: /bin/dd if=/dev/zero of=/dev/nvme0n1p3 bs=1M count=10 conv=fsync
    [ceph-node1][WARNIN]  stderr: 10+0 records in
    [ceph-node1][WARNIN] 10+0 records out
    [ceph-node1][WARNIN] 10485760 bytes (10 MB) copied
    [ceph-node1][WARNIN]  stderr: , 0.0221962 s, 472 MB/s
    [ceph-node1][WARNIN] --> Zapping successful for: <Partition: /dev/nvme0n1p3>
    

    开始创建osd:

    ## 举例,其他盘也要相同的操作
    $ ceph-deploy osd create ceph-node1 --data /dev/nvme0n1p3
    $ ceph-deploy osd create ceph-node1 --data /dev/nvme0n1p4
    ...
    

    osd会根据创建顺序来进行编号命名,第一个为0,以此类推...

    查看的osd进程:

    ## ceph-node1进程
    # ps -ef | grep ceph-osd
    ceph       61629       1  0 17:16 ?        00:00:01 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
    ceph       62896       1  0 17:17 ?        00:00:01 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
    ceph       63569       1  0 17:18 ?        00:00:01 /usr/bin/ceph-osd -f --cluster ceph --id 2 --setuser ceph --setgroup ceph
    ceph       64519       1  0 17:18 ?        00:00:01 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph
    
    ## ceph-node2进程
    # ps -ef | grep osd 
    ceph       64649       1  0 17:27 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 4 --setuser ceph --setgroup ceph
    ceph       65423       1  0 17:27 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
    ceph       66082       1  0 17:28 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 6 --setuser ceph --setgroup ceph
    ceph       66701       1  0 17:28 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 7 --setuser ceph --setgroup ceph
    
    ## ceph-node3进程
    # ps -ef | grep osd 
    ceph       30549       1  0 11:30 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 8 --setuser ceph --setgroup ceph
    ceph       31270       1  0 11:30 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 9 --setuser ceph --setgroup ceph
    ceph       32220       1  1 11:31 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 10 --setuser ceph --setgroup ceph
    ceph       32931       1  1 11:31 ?        00:00:00 /usr/bin/ceph-osd -f --cluster ceph --id 11 --setuser ceph --setgroup ceph
    

    osd编号为0-11,共12块盘。

    osd服务启动完毕,再次查看ceph集群状态:

    $ ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_OK
     
      services:
        mon: 1 daemons, quorum ceph-node1 (age 19h)
        mgr: ceph-node1(active, since 19h)
        osd: 12 osds: 12 up (since 99s), 12 in (since 99s)
     
      data:
        pools:   1 pools, 1 pgs
        objects: 2 objects, 0 B
        usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
        pgs:     1 active+clean
        
    ## 默认存在一个pool,此pool是添加osd时系统自动创建的
    $ ceph osd lspools 
    1 device_health_metrics
    
    $ ceph df 
    --- RAW STORAGE ---
    CLASS  SIZE     AVAIL    USED     RAW USED  %RAW USED
    ssd    1.2 TiB  1.2 TiB  9.7 MiB    12 GiB       1.00
    TOTAL  1.2 TiB  1.2 TiB  9.7 MiB    12 GiB       1.00
     
    --- POOLS ---
    POOL                   ID  PGS  STORED  OBJECTS  USED  %USED  MAX AVAIL
    device_health_metrics   1    1     0 B        2   0 B      0    376 GiB
    
    至此,基本的ceph集群已经搭建成功,rbd功能已经可以开始使用。
    另外如果想要开启对象存储以及文件系统的功能,还需要部署rgw、mds和cephfs。此时的mon、mgr等组件都没实现高可用,先进行这些重要组件的横向扩展。

    4.扩展ceph-mon节点

    (1)目标机器安装ceph-mon组件

    # yum -y install ceph-mon ceph-common
    

    (2)添加mon机器

    $ ceph-deploy mon add ceph-node2
    $ ceph-deploy mon add ceph-node3
    

    (3)检查集群状态

    $ ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 10m)
        mgr: ceph-node1(active, since 25h)
        osd: 12 osds: 12 up (since 6h), 12 in (since 6h)
     
      data:
        pools:   1 pools, 1 pgs
        objects: 2 objects, 0 B
        usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
        pgs:     1 active+clean
    
    ## 可以使用此命令查看mon的详细信息及状态
    $ ceph quorum_status --format json-pretty
    

    现在mon已经成3个节点

    5.扩展ceph-mgr节点

    (1)目标机器安装ceph-mgr组件

    # yum -y install ceph-mgr ceph-common
    

    (2)添加mgr机器

    $ ceph-deploy mgr create ceph-node2
    

    (3)验证集群状态

    $ ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 24m)
        mgr: ceph-node1(active, since 25h), standbys: ceph-node2
        osd: 12 osds: 12 up (since 6h), 12 in (since 6h)
     
      data:
        pools:   1 pools, 1 pgs
        objects: 2 objects, 0 B
        usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
        pgs:     1 active+clean
    

    mgr的高可用是主备形式的,而mon是集群选主形式。

    6.增加mds(元数据服务)、cephfs提供文件系统功能

    mds服务为一个单独存储服务,想要正常运行必须要单独指定两个存储池,一个用来存储cephfs的元数据,另一个用来存储data数据,元数据池主要保存文件目录的大小名称等元数据,data池用来保存实际文件等。

    (1)安装mds安装包

    $ ceph-deploy mds create ceph-node1
    $ ceph-deploy mds create ceph-node2
    

    检查ceph状态

    $ ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 18h)
        mgr: ceph-node1(active, since 43h), standbys: ceph-node2
        mds:  2 up:standby
        osd: 12 osds: 12 up (since 24h), 12 in (since 24h)
     
      task status:
     
      data:
        pools:   1 pools, 1 pgs
        objects: 3 objects, 0 B
        usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
        pgs:     1 active+clean
    

    目前mds已经加入集群,但是都处于standby(备用)状态,因为mds必须要分别指定元数据和数据的存储池:

    ## 先创建元数据池和data池,后边是数据分别得pg和pgp的数量
    $ ceph osd pool create cephfs-metedata 32 32 
    pool 'cephfs-metedata' created
    $ ceph osd pool create cephfs-data 64 64 
    pool 'cephfs-data' created
    
    $ ceph osd lspools 
    1 device_health_metrics
    2 cephfs-metedata
    3 cephfs-data
    

    (2)创建cephfs

    $ ceph fs new mycephfs cephfs-metedata cephfs-data
    new fs with metadata pool 2 and data pool 3
    
    ## 创建语法
    ceph fs new <fs_name> <metadata> <data> [--force] [--allow-dangerous-metadata-overlay]
    

    再次检查ceph集群状态

    $ ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 18h)
        mgr: ceph-node1(active, since 43h), standbys: ceph-node2
        mds: mycephfs:1 {0=ceph-node2=up:active} 1 up:standby
        osd: 12 osds: 12 up (since 24h), 12 in (since 24h)
     
      task status:
     
      data:
        pools:   3 pools, 97 pgs
        objects: 25 objects, 2.2 KiB
        usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
        pgs:     97 active+clean
    
    $ ceph mds stat
    mycephfs:1 {0=ceph-node2=up:active} 1 up:standby
    

    cephfs功能搭建完毕

    6.增加rgw组件,提供对象存储功能

    rgw提供的事REST接口,客户端通过http与其交互,完成数据的增删改查等管理操作。一般会有多个rgw保证高可用,rgw前边挂一个负载均衡器进行分发。

    (1)安装rgw组件

    # yum -y install ceph-radosgw
    

    (2)部署rgw

    $ ceph-deploy --overwrite-conf rgw create ceph-node1
    $ ceph-deploy --overwrite-conf rgw create ceph-node2
    [ceph-node1][INFO  ] Running command: sudo systemctl start ceph-radosgw@rgw.ceph-node1
    [ceph-node1][INFO  ] Running command: sudo systemctl enable ceph.target
    [ceph_deploy.rgw][INFO  ] The Ceph Object Gateway (RGW) is now running on host ceph-node1 and default port 7480
    

    检查ceph集群状态

    $ ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum ceph-node1,ceph-node2,ceph-node3 (age 24h)
        mgr: ceph-node1(active, since 2d), standbys: ceph-node2
        mds: mycephfs:1 {0=ceph-node2=up:active} 1 up:standby
        osd: 12 osds: 12 up (since 30h), 12 in (since 30h)
        rgw: 2 daemons active (ceph-node1, ceph-node2)
     
      task status:
     
      data:
        pools:   7 pools, 201 pgs
        objects: 212 objects, 6.9 KiB
        usage:   12 GiB used, 1.2 TiB / 1.2 TiB avail
        pgs:     201 active+clean
     
      io:
        client:   35 KiB/s rd, 0 B/s wr, 34 op/s rd, 23 op/s wr
    

    后续

    1. 因为初始化mon节点时,只初始化了一个,所以目前ceph.conf中还是只有一个mon_host,导致并未实现高可用,需要重新获取集群信息,重写ceph.conf文件:
    $ ceph-deploy --overwrite-conf config push ceph-node1 ceph-node2 ceph-node3
    $ cat /etc/ceph/ceph.conf 
    [global]
    fsid = 537175bb-51de-4cc4-9ee3-b5ba8842bff2
    public_network = 10.0.0.0/8
    cluster_network = 10.0.0.0/8
    mon_initial_members = ceph-node1
    mon_host = 10.153.204.13,10.130.22.45,10.153.204.28
    auth_cluster_required = cephx
    auth_service_required = cephx
    auth_client_required = cephx
    
    1. ceph集群启停顺序

    重启之前,要提前设置 Ceph 集群不要将 OSD 标记为 out,避免 node 节点关闭服务后被踢出 Ceph 集群外,一旦被踢出去,ceph就会自动进行数据平衡:

    ## 设置noout
    $ ceph osd set noout
    noout is set
    
    ## 取消noout
    $ ceph osd unset noout
    noout is unset
    

    停止顺序:

    1. 关闭服务前设置 noout;
    2. 关闭存储客户端停止读写数据;
    3. 如果使用了 RGW,则关闭 RGW 服务 ;
    4. 关闭 CephFS 元数据服务;
    5. 关闭 Ceph OSD 服务;
    6. 关闭 Ceph Manager 服务;
    7. 关闭 Ceph Monitor 服务;

    启动顺序:

    1. 启动 Ceph Monitor 服务;
    2. 启动 Ceph Manager 服务;
    3. 启动 Ceph OSD 服务;
    4. 启动 CephFS 元数据服务;
    5. 启动 RGW 服务;
    6. 启动存储客户端;
    7. 最后取消 noout 设置;

    总结

    到此为止,一个完整的高可用ceph集群搭建完毕,现在仅仅是实现了搭建,下篇文章详细介绍如何使用rbd、cephfs、和对象存储功能。

    ceph常见运维问题FAQ

    1. osd下线流程

    (1)如果osd机器还在正常运行,不是非正常下删除osd,那首先要将此osd的权重设置为0,等待此osd的所有数据迁移出去并不再接受新数据。

    $ ceph osd crush reweight osd.8 0
    reweighted item id 8 name 'osd.8' to 0 in crush map
    

    如果数据量过大,权重数值最好慢慢的调整,0.7->0.4>0.1>0,以保证ceph集群最大的稳定性。

    (2)停止osd进程

    # systemctl stop ceph-osd@8.service
    

    停止osd的进程,这个是通知集群这个osd进程不在了,不提供服务了,因为本身没权重,就不会影响到整体的分布,也没有数据迁移。

    (3)将节点状态标记为out

    $ ceph osd out osd.8
    

    这一步是告诉mon,这个节点已经不能服务了,需要在其他的osd上进行数据的恢复,但是前边已经做了reweight,所以也不会有数据发生迁移。

    (4)从crush表中移除节点

    $ ceph osd crush remove osd.8
    removed item id 8 name 'osd.8' from crush map
    

    从crush中删除是告诉集群这个节点要完全剔除掉,让集群的crush进行一次重新计算,因为已经做了reweight,所以crush weight也已经成0。

    (5)删除osd节点

    $ ceph osd rm osd.8
    removed osd.8
    

    从集群里面删除这个节点的记录

    (6)删除节点认证

    $ ceph auth del osd.8
    updated
    

    这个认证如果不删除,osd的编号会占住不释放。

    (7)最后查看集群状态

    $ ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_WARN
                Degraded data redundancy: 152/813 objects degraded (18.696%), 43 pgs degraded, 141 pgs undersized
     
      services:
        mon: 2 daemons, quorum ceph-node1,ceph-node2 (age 111s)
        mgr: ceph-node1(active, since 11d), standbys: ceph-node2
        mds: mycephfs:1 {0=ceph-node2=up:active} 1 up:standby
        osd: 8 osds: 8 up (since 3d), 8 in (since 3d); 124 remapped pgs
        rgw: 2 daemons active (ceph-node1, ceph-node2)
     
      task status:
     
      data:
        pools:   8 pools, 265 pgs
        objects: 271 objects, 14 MiB
        usage:   8.1 GiB used, 792 GiB / 800 GiB avail
        pgs:     152/813 objects degraded (18.696%)
                 114/813 objects misplaced (14.022%)
                 111 active+clean+remapped
                 98  active+undersized
                 43  active+undersized+degraded
                 13  active+clean
    

    因为ceph-node3节点上的osd服务全部被剔除了,所以现在osd节点还剩8个,由于这8个osd全部集中在两台主机中,所有有很多不是active+clean的pg,一旦有新机器的osd节点上线,pgp就会自动分布。

    2. mon下线流程

    (1)查看mon状态

    $ ceph mon stat  
    e3: 3 mons at {ceph-node2=[v2:10.130.22.45:3300/0,v1:10.130.22.45:6789/0],ceph-node1=[v2:10.153.204.13:3300/0,v1:10.153.204.13:6789/0],ceph-node3=[v2:10.153.204.28:3300/0,v1:10.153.204.28:6789/0]}, election epoch 48, leader 0 ceph-node1, quorum 0,1 ceph-node1,ceph-node2
    

    (2)停止mon

    systemctl stop ceph-mon@ceph-node3
    

    (3)移出mon

    $ ceph mon remove ceph-node3
    removing mon.ceph-node3 at [v2:10.153.204.28:3300/0,v1:10.153.204.28:6789/0], there will be 2 monitors
    

    (4)在ceph.conf配置文件中删除mon.host字段

    $ cat ceph.conf
    [global]
    fsid = 537175bb-51de-4cc4-9ee3-b5ba8842bff2
    public_network = 10.0.0.0/8
    cluster_network = 10.0.0.0/8
    mon_initial_members = ceph-node1
    mon_host = 10.153.204.13,10.130.22.45
    auth_cluster_required = cephx
    auth_service_required = cephx
    auth_client_required = cephx
    

    (5)再次查看集群状态

    $ ceph -s 
      cluster:
        id:     537175bb-51de-4cc4-9ee3-b5ba8842bff2
        health: HEALTH_WARN
                Degraded data redundancy: 152/813 objects degraded (18.696%), 43 pgs degraded, 141 pgs undersized
     
      services:
        mon: 2 daemons, quorum ceph-node1,ceph-node2 (age 111s)
        mgr: ceph-node1(active, since 11d), standbys: ceph-node2
        mds: mycephfs:1 {0=ceph-node2=up:active} 1 up:standby
        osd: 8 osds: 8 up (since 3d), 8 in (since 3d); 124 remapped pgs
        rgw: 2 daemons active (ceph-node1, ceph-node2)
     
      task status:
     
      data:
        pools:   8 pools, 265 pgs
        objects: 271 objects, 14 MiB
        usage:   8.1 GiB used, 792 GiB / 800 GiB avail
        pgs:     152/813 objects degraded (18.696%)
                 114/813 objects misplaced (14.022%)
                 111 active+clean+remapped
                 98  active+undersized
                 43  active+undersized+degraded
                 13  active+clean
    
  • 相关阅读:
    Scala词频统计
    Java反射
    webstorm配置console.log打印
    css实现四角圆边框
    时间序列预测:深度学习、机器学习、融合模型、创新模型实战案例(附代码+数据集+原理介绍)
    包装类与数据类型
    JAVASE总结作业----接口和抽象
    win10 docker .vhdx 文件过大
    【100天精通python】Day24:python 迭代器,生成器,修饰器应用详解与示例
    Lua速成(1)
  • 原文地址:https://www.cnblogs.com/v-fan/p/15945956.html