Skip to content

Upgrading Rook Ceph from Version 1.7 to 1.9

This section provides detailed upgrade instructions to ensure the uninterrupted operation of existing services during the upgrade process. It ensures seamless service functionality and highlights version-specific changes, such as Rook v1.10 dropping support for Ceph Octopus (15.2.x).

Prerequisites

The following requirements must be met to proceed with the upgrade process:

  • Kubernetes: 1.26
  • OS system: RockyLinux8 / CentOS8 / RedHat8
  • kubectl: 1.26
  • Other: Upgrade step by step
  • Time cost: >30 mins

Run the following commands to verify the system configuration:

kubectl version  
uname -a; cat /etc/redhat-release  

Verifying upgrade health Status for version 1.7 to 1.8

Note

Verify the health status of the cluster and its versions to ensure that they are stable and error-free before proceeding with the update.

  1. Set the namespaces for the Rook cluster and operator.

    ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph
    ROOK_OPERATOR_NAMESPACE=mdsp-bk-ceph
    
  2. Verify that all pods in the Rook cluster namespace are running.

    kubectl -n $ROOK_CLUSTER_NAMESPACE get pods
    
  3. Verify that the Ceph status is healthy and ensure it is not in a HEALTH_ERR state.

    TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}')
    kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
    
  4. Check Rook-Ceph component versions.

    POD_NAME=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -o custom-columns=name:.metadata.name --no-headers | grep rook-ceph-mon-b)
    kubectl -n $ROOK_CLUSTER_NAMESPACE get pod ${POD_NAME} -o jsonpath='{.spec.containers[0].image}'
    kubectl -n $ROOK_OPERATOR_NAMESPACE get pod -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.status.phase}{"\t\t"}{.spec.containers[0].image}{"\t"}{.spec.initContainers[0]}{"\n"}{end}' && \
    kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.status.phase}{"\t\t"}{.spec.containers[0].image}{"\t"}{.spec.initContainers[0].image}{"\n"}{end}'
    kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
    kubectl -n $ROOK_CLUSTER_NAMESPACE get jobs -o jsonpath='{range .items[*]}{.metadata.name}{"  \tsucceeded: "}{.status.succeeded}{"      \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
    

Verifying Ceph cluster health

After the upgrade, verify the following aspects of the Ceph cluster:

Note

  • Check that all Ceph monitors are in quorum.
  • Verify that the Ceph manager is active.
  • Confirm that all Object Storage Daemons (OSDs) are up and running.
  • Verify that the Rados Gateway (RGW) is active.
    image

Upgrading version from 1.7 to 1.8

Upgrading Rook

To upgrade Rook, follow the steps:

  1. Clone the specific version of Rook repository using the following command.

    git clone --single-branch --depth=1 --branch v1.8.10 https://github.com/rook/rook.git
    cd rook/deploy/examples/
    
  2. Set the namespaces for the Rook cluster and operator.

    ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph
    ROOK_OPERATOR_NAMESPACE=mdsp-bk-ceph
    
  3. Update the common.yaml file with the correct namespaces.

    sed -i.bak  -e "s/\(.*\):.*# namespace:operator/\1: $ROOK_OPERATOR_NAMESPACE # namespace:operator/g"  -e "s/\(.*\):.*# namespace:cluster/\1: $ROOK_CLUSTER_NAMESPACE # namespace:cluster/g" common.yaml
    
  4. If the Kubernetes k8s version is new for PodSecurityPolicy, manually remove the PodSecurityPolicy type resource from common.yaml:

    vi common.yaml
    # Manually remove the PodSecurityPolicy-related paragraph
    
  5. Create the Custom Resource Definitions (CRDs) and the updated common resources:

    kubectl apply -f crds.yaml
    kubectl apply -f common.yaml
    
  6. Modify the namespace to match the correct operator namespace and create the necessary resources for monitoring.

    sed -i "s/namespace: rook-ceph/namespace: $ROOK_OPERATOR_NAMESPACE/g" monitoring/rbac.yaml
    kubectl apply -f monitoring/rbac.yaml
    
  7. Edit the operator's configuration and update the CSI images.

    kubectl -n $ROOK_OPERATOR_NAMESPACE edit configmap rook-ceph-operator-config
    
  8. Modify the rook-ceph-operator-config with the following values.

    ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.5.1"
    ROOK_CSI_REGISTRAR_IMAGE: "objectscale/csi-node-driver-registrar:v2.5.0"
    ROOK_CSI_PROVISIONER_IMAGE: "objectscale/csi-provisioner:v3.1.0"
    ROOK_CSI_ATTACHER_IMAGE: "longhornio/csi-attacher:v3.4.0"
    ROOK_CSI_RESIZER_IMAGE: "objectscale/csi-resizer:v1.4.0"
    ROOK_CSI_SNAPSHOTTER_IMAGE: "longhornio/csi-snapshotter:v5.0.1"
    CSI_VOLUME_REPLICATION_IMAGE: "quay.io/csiaddons/volumereplication-operator:v0.3.0"
    ROOK_CSIADDONS_IMAGE: "quay.io/csiaddons/k8s-sidecar:v0.2.1"
    
  9. Upgrade the Rook operator deployment to the new version.

    kubectl -n $ROOK_OPERATOR_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.8.10
    

    The Rook operator deployment will be successfully upgraded to the v1.8.10 version.

Checking result after upgrade

After performing an upgrade, it is required to verify the system state before proceeding, to avoid unnecessary work or irreversible harm.

Follow the steps to ensure that the Rook operator is updated successfully.

  1. Wait until all deployments are updated to the specified version.

    watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
    
  2. Verify that only one Rook version is being used across all deployments.

    kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
    
  3. After upgrading the Rook operator, Ceph will display a recovery status. Monitor the recovery process by checking the output.

    recovery: 412 KiB/s, 7 objects/s
    

Upgrading Ceph

Note

No Ceph upgrade is required, as version 16.2.0 is still supported.

Re-checking Ceph health status

It is required to verify the system's state after an upgrade to ensure that it is ready for the next steps. Proceeding without this check could lead to unnecessary work or irreparable harm.

Note

Ensure the Ceph status is not in HEALTH_ERR.

Check the Ceph status using the following commands:

TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}')
kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status

Upgrading the version from 1.8 to 1.9

Upgrading Rook

To upgrade Rook, follow the steps:

  1. Clone the Rook v1.9.13 repository.

    git clone --single-branch --depth=1 --branch v1.9.13 https://github.com/rook/rook.git
    cd rook/deploy/examples/
    
  2. Set the namespaces for the Rook cluster and operator.

    export ROOK_OPERATOR_NAMESPACE=mdsp-bk-ceph
    export ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph
    
  3. Update the common.yaml file with the correct namespaces.

    sed -i.bak -e "s/\(.*\):.*# namespace:operator/\1: $ROOK_OPERATOR_NAMESPACE # namespace:operator/g" -e "s/\(.*\):.*# namespace:cluster/\1: $ROOK_CLUSTER_NAMESPACE # namespace:cluster/g" common.yaml
    
  4. Create the Custom Resource Definitions (CRDs) and the updated common resources.

    kubectl apply -f crds.yaml
    kubectl apply -f common.yaml
    
  5. Modify the namespace to match the correct operator namespace and create the necessary resources for monitoring.

    sed -i "s/namespace: rook-ceph/namespace: $ROOK_OPERATOR_NAMESPACE/g" monitoring/rbac.yaml
    kubectl apply -f monitoring/rbac.yaml
    
  6. Update the ROOK_CSI_CEPH_IMAGE in the rook-ceph-operator-config ConfigMap with the desired version.

    kubectl -n $ROOK_OPERATOR_NAMESPACE edit configmap rook-ceph-operator-config
    ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.6.1"
    ROOK_CSI_REGISTRAR_IMAGE: "objectscale/csi-node-driver-registrar:v2.5.0"
    ROOK_CSI_PROVISIONER_IMAGE: "objectscale/csi-provisioner:v3.1.0"
    ROOK_CSI_ATTACHER_IMAGE: "longhornio/csi-attacher:v3.4.0"
    ROOK_CSI_RESIZER_IMAGE: "objectscale/csi-resizer:v1.4.0"
    ROOK_CSI_SNAPSHOTTER_IMAGE: "longhornio/csi-snapshotter:v5.0.1"
    CSI_VOLUME_REPLICATION_IMAGE: "quay.io/csiaddons/volumereplication-operator:v0.3.0"
    ROOK_CSIADDONS_IMAGE: "quay.io/csiaddons/k8s-sidecar:v0.2.1"
    
  7. Upgrade the Rook operator to the latest version by updating the deployment image.

    kubectl -n $ROOK_OPERATOR_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.9.13
    

    The Rook operator deployment will be successfully upgraded to the v1.9.13 version.

Checking result after Upgrade

After the latest upgrade, it is required to verify the system state before proceeding to avoid unnecessary work or irreversible harm.

Follow these steps to ensure that the Rook operator is updated successfully.

  1. Wait until all deployments are updated to the specified version, which includes the number of replicas, updated replicas, available replicas and the Rook version for each deployment.

    watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
    
  2. Verify that only one Rook version is being used across all deployments.

    kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
    
  3. Verify that Ceph is in a healthy state. Retrieve the Rook Ceph tools pod name and status of the Ceph status.

    TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}')
    kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
    

Upgrading Ceph

  1. Set the namespace and Rook cluster name. Use the CephCluster resource to obtain the Ceph cluster name.

    ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph
    ROOK_CLUSTER=rook-ceph
    
  2. Set the new Ceph image version and upgrade the Ceph cluster.

    NEW_CEPH_IMAGE='quay.io/ceph/ceph:v16.2.10'
    kubectl -n $ROOK_CLUSTER_NAMESPACE patch CephCluster $ROOK_CLUSTER --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}"
    
  3. Wait for the Ceph version to upgrade to the desired version.

    watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{"  \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{"  \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'
    

Rechecking Ceph health status

It is required to verify the system's state after upgrading to the latest version to ensure it is ready for the next steps. Proceeding without this check could lead to unnecessary work or irreparable harm.

  1. Set the Rook cluster namespace.

    ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph
    
  2. Check that all pods are running and ready in the specified namespace.

    kubectl -n $ROOK_CLUSTER_NAMESPACE get pods
    
  3. Verify that the Ceph is in a healthy state and ensure that it is not in HEALTH_ERR state.

    TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}')
    kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
    

Keeping Helm scripts consistent

This section ensures consistency in Helm scripts, including standard naming conventions, values and templates.

Modifying common.yaml

  1. Set the namespaces for the Rook cluster and operator if required.

    export ROOK_OPERATOR_NAMESPACE=mdsp-bk-ceph
    export ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph
    
  2. Update the common.yaml file, replacing the operator and cluster namespace placeholders with the correct values.

    sed -i.bak -e "s/\(.*\):.*# namespace:operator/\1: $ROOK_OPERATOR_NAMESPACE # namespace:operator/g" -e "s/\(.*\):.*# namespace:cluster/\1: $ROOK_CLUSTER_NAMESPACE # namespace:cluster/g" common.yaml
    

Modifying operator.yaml

  1. Set Rook CSI parameters, modify or add the following parameters to disable or configure the Rook CSI features.

    ROOK_CSI_ENABLE_RBD: "false"
    CSI_ENABLE_RBD_SNAPSHOTTER: "false"
    ROOK_CSI_ENABLE_CEPHFS: "false"
    CSI_ENABLE_CEPHFS_SNAPSHOTTER: "false"
    CSI_PROVISIONER_NODE_AFFINITY: "ceph=true"
    CSI_PLUGIN_NODE_AFFINITY: "ceph=true"
    
    CSI_PROVISIONER_TOLERATIONS: |
      - effect: "NoExecute"
        key: "domain"
        operator: "Equal"
        value: "ceph"
    CSI_PLUGIN_NODE_AFFINITY: "ceph=true"
    CSI_PLUGIN_TOLERATIONS: |
      - effect: NoExecute
        key: domain
        operator: Exists
    
  2. Add node affinity and tolerations to the rook-ceph-operator deployment under the template.spec field.

    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: "ceph"
              operator: "In"
              values:
              - "true"
    tolerations:
    - effect: "NoExecute"
      key: "domain"
      operator: "Equal"
      value: "ceph"
    
  3. Add variables to the rook-ceph operator deployment that controls the scheduling nodes of some plugins created by the operator.

    - name: DISCOVER_TOLERATIONS
      value: |
        - effect: NoExecute
          key: domain
          operator: Exists
    - name: DISCOVER_AGENT_NODE_AFFINITY
      value: "ceph=true"
    

Modifying cluster.yaml

  1. Update the namespace field in cluster.yaml to the desired namespace for your Rook cluster.

  2. Modify the cluster.yaml to expose the necessary port for the dashboard to make the Rook Ceph dashboard accessible.

  3. Schedule all Rook services (such as OSDs, MONs, and OSD prepare) on Kubernetes nodes marked with ceph=true.

    placement:
      all:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: "ceph"
                operator: In
                values:
                - "true"
        podAffinity:
        podAntiAffinity:
        topologySpreadConstraints:
        tolerations:
        - key: "domain"
          operator: "Equal"
          value: "ceph"
          effect: "NoExecute"
    
  4. To set specific disks for use as Bluestore OSDs, specify the deviceFilter to match the desired disk. Use regular expressions such as ^ [s] | v - f [b].
    Ensure that the selected disk is unmarked and has no filesystem.

    useAllDevices: false
    deviceFilter: sdb
    

Modifying Toolbox YAML

  1. Update the namespace in the Toolbox YAML file to match your desired configuration.

  2. Modify the Toolbox deployment to include node affinity and tolerations.

    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: "ceph"
              operator: "In"
              values:
              - "true"
    tolerations:
    - effect: "NoExecute"
      key: "domain"
      operator: "Equal"
      value: "ceph"
    

Modifying Object YAML

  1. Update the namespace as per your requirements.
  2. Modify the Object placement settings to ensure that it is scheduled on nodes with the ceph=true label and includes the necessary tolerations.

    placement:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: "ceph"
              operator: In
              values:
              - "true"
    tolerations:
    - effect: "NoExecute"
      key: "domain"
      operator: "Equal"
      value: "ceph"
    

Modifying rgw-external YAML

  1. Change the namespace in the rgw-external YAML file as per your requirements.
  2. Modify the service type to NodePort in the rgw-external YAML.

Last update: January 27, 2025

Except where otherwise noted, content on this site is licensed under the Development License Agreement.