Upgrading Rook Ceph from Version 1.7 to 1.9¶
This section provides detailed upgrade instructions to ensure the uninterrupted operation of existing services during the upgrade process. It ensures seamless service functionality and highlights version-specific changes, such as Rook v1.10 dropping support for Ceph Octopus (15.2.x).
Prerequisites¶
The following requirements must be met to proceed with the upgrade process:
- Kubernetes: 1.26
- OS system: RockyLinux8 / CentOS8 / RedHat8
- kubectl: 1.26
- Other: Upgrade step by step
- Time cost: >30 mins
Run the following commands to verify the system configuration:
kubectl version
uname -a; cat /etc/redhat-release
Verifying upgrade health Status for version 1.7 to 1.8¶
Note
Verify the health status of the cluster and its versions to ensure that they are stable and error-free before proceeding with the update.
-
Set the namespaces for the Rook cluster and operator.
ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph ROOK_OPERATOR_NAMESPACE=mdsp-bk-ceph
-
Verify that all pods in the Rook cluster namespace are running.
kubectl -n $ROOK_CLUSTER_NAMESPACE get pods
-
Verify that the Ceph status is healthy and ensure it is not in a
HEALTH_ERR
state.TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}') kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
-
Check Rook-Ceph component versions.
POD_NAME=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -o custom-columns=name:.metadata.name --no-headers | grep rook-ceph-mon-b) kubectl -n $ROOK_CLUSTER_NAMESPACE get pod ${POD_NAME} -o jsonpath='{.spec.containers[0].image}' kubectl -n $ROOK_OPERATOR_NAMESPACE get pod -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.status.phase}{"\t\t"}{.spec.containers[0].image}{"\t"}{.spec.initContainers[0]}{"\n"}{end}' && \ kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -o jsonpath='{range .items[*]}{.metadata.name}{"\n\t"}{.status.phase}{"\t\t"}{.spec.containers[0].image}{"\t"}{.spec.initContainers[0].image}{"\n"}{end}' kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}' kubectl -n $ROOK_CLUSTER_NAMESPACE get jobs -o jsonpath='{range .items[*]}{.metadata.name}{" \tsucceeded: "}{.status.succeeded}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
Verifying Ceph cluster health¶
After the upgrade, verify the following aspects of the Ceph cluster:
Note
- Check that all Ceph monitors are in
quorum
. - Verify that the Ceph manager is active.
- Confirm that all Object Storage Daemons (OSDs) are up and running.
- Verify that the Rados Gateway (RGW) is active.
Upgrading version from 1.7 to 1.8¶
Upgrading Rook¶
To upgrade Rook, follow the steps:
-
Clone the specific version of Rook repository using the following command.
git clone --single-branch --depth=1 --branch v1.8.10 https://github.com/rook/rook.git cd rook/deploy/examples/
-
Set the namespaces for the Rook cluster and operator.
ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph ROOK_OPERATOR_NAMESPACE=mdsp-bk-ceph
-
Update the
common.yaml
file with the correct namespaces.sed -i.bak -e "s/\(.*\):.*# namespace:operator/\1: $ROOK_OPERATOR_NAMESPACE # namespace:operator/g" -e "s/\(.*\):.*# namespace:cluster/\1: $ROOK_CLUSTER_NAMESPACE # namespace:cluster/g" common.yaml
-
If the Kubernetes
k8s
version is new forPodSecurityPolicy
, manually remove thePodSecurityPolicy
type resource fromcommon.yaml
:vi common.yaml # Manually remove the PodSecurityPolicy-related paragraph
-
Create the Custom Resource Definitions (CRDs) and the updated common resources:
kubectl apply -f crds.yaml kubectl apply -f common.yaml
-
Modify the namespace to match the correct operator namespace and create the necessary resources for monitoring.
sed -i "s/namespace: rook-ceph/namespace: $ROOK_OPERATOR_NAMESPACE/g" monitoring/rbac.yaml kubectl apply -f monitoring/rbac.yaml
-
Edit the operator's configuration and update the CSI images.
kubectl -n $ROOK_OPERATOR_NAMESPACE edit configmap rook-ceph-operator-config
-
Modify the
rook-ceph-operator-config
with the following values.ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.5.1" ROOK_CSI_REGISTRAR_IMAGE: "objectscale/csi-node-driver-registrar:v2.5.0" ROOK_CSI_PROVISIONER_IMAGE: "objectscale/csi-provisioner:v3.1.0" ROOK_CSI_ATTACHER_IMAGE: "longhornio/csi-attacher:v3.4.0" ROOK_CSI_RESIZER_IMAGE: "objectscale/csi-resizer:v1.4.0" ROOK_CSI_SNAPSHOTTER_IMAGE: "longhornio/csi-snapshotter:v5.0.1" CSI_VOLUME_REPLICATION_IMAGE: "quay.io/csiaddons/volumereplication-operator:v0.3.0" ROOK_CSIADDONS_IMAGE: "quay.io/csiaddons/k8s-sidecar:v0.2.1"
-
Upgrade the Rook operator deployment to the new version.
kubectl -n $ROOK_OPERATOR_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.8.10
The Rook operator deployment will be successfully upgraded to the v1.8.10 version.
Checking result after upgrade¶
After performing an upgrade, it is required to verify the system state before proceeding, to avoid unnecessary work or irreversible harm.
Follow the steps to ensure that the Rook operator is updated successfully.
-
Wait until all deployments are updated to the specified version.
watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
-
Verify that only one Rook version is being used across all deployments.
kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
-
After upgrading the Rook operator, Ceph will display a recovery status. Monitor the recovery process by checking the output.
recovery: 412 KiB/s, 7 objects/s
Upgrading Ceph¶
Note
No Ceph upgrade is required, as version 16.2.0 is still supported.
Re-checking Ceph health status¶
It is required to verify the system's state after an upgrade to ensure that it is ready for the next steps. Proceeding without this check could lead to unnecessary work or irreparable harm.
Note
Ensure the Ceph status is not in HEALTH_ERR
.
Check the Ceph status using the following commands:
TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}')
kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
Upgrading the version from 1.8 to 1.9¶
Upgrading Rook¶
To upgrade Rook, follow the steps:
-
Clone the Rook v1.9.13 repository.
git clone --single-branch --depth=1 --branch v1.9.13 https://github.com/rook/rook.git cd rook/deploy/examples/
-
Set the namespaces for the Rook cluster and operator.
export ROOK_OPERATOR_NAMESPACE=mdsp-bk-ceph export ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph
-
Update the
common.yaml
file with the correct namespaces.sed -i.bak -e "s/\(.*\):.*# namespace:operator/\1: $ROOK_OPERATOR_NAMESPACE # namespace:operator/g" -e "s/\(.*\):.*# namespace:cluster/\1: $ROOK_CLUSTER_NAMESPACE # namespace:cluster/g" common.yaml
-
Create the Custom Resource Definitions (CRDs) and the updated common resources.
kubectl apply -f crds.yaml kubectl apply -f common.yaml
-
Modify the namespace to match the correct operator namespace and create the necessary resources for monitoring.
sed -i "s/namespace: rook-ceph/namespace: $ROOK_OPERATOR_NAMESPACE/g" monitoring/rbac.yaml kubectl apply -f monitoring/rbac.yaml
-
Update the
ROOK_CSI_CEPH_IMAGE
in therook-ceph-operator-config
ConfigMap
with the desired version.kubectl -n $ROOK_OPERATOR_NAMESPACE edit configmap rook-ceph-operator-config ROOK_CSI_CEPH_IMAGE: "quay.io/cephcsi/cephcsi:v3.6.1" ROOK_CSI_REGISTRAR_IMAGE: "objectscale/csi-node-driver-registrar:v2.5.0" ROOK_CSI_PROVISIONER_IMAGE: "objectscale/csi-provisioner:v3.1.0" ROOK_CSI_ATTACHER_IMAGE: "longhornio/csi-attacher:v3.4.0" ROOK_CSI_RESIZER_IMAGE: "objectscale/csi-resizer:v1.4.0" ROOK_CSI_SNAPSHOTTER_IMAGE: "longhornio/csi-snapshotter:v5.0.1" CSI_VOLUME_REPLICATION_IMAGE: "quay.io/csiaddons/volumereplication-operator:v0.3.0" ROOK_CSIADDONS_IMAGE: "quay.io/csiaddons/k8s-sidecar:v0.2.1"
-
Upgrade the Rook operator to the latest version by updating the deployment image.
kubectl -n $ROOK_OPERATOR_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.9.13
The Rook operator deployment will be successfully upgraded to the v1.9.13 version.
Checking result after Upgrade¶
After the latest upgrade, it is required to verify the system state before proceeding to avoid unnecessary work or irreversible harm.
Follow these steps to ensure that the Rook operator is updated successfully.
-
Wait until all deployments are updated to the specified version, which includes the number of replicas, updated replicas, available replicas and the Rook version for each deployment.
watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
-
Verify that only one Rook version is being used across all deployments.
kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
-
Verify that Ceph is in a healthy state. Retrieve the Rook Ceph tools pod name and status of the Ceph status.
TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}') kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
Upgrading Ceph¶
-
Set the namespace and Rook cluster name. Use the
CephCluster
resource to obtain the Ceph cluster name.ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph ROOK_CLUSTER=rook-ceph
-
Set the new Ceph image version and upgrade the Ceph cluster.
NEW_CEPH_IMAGE='quay.io/ceph/ceph:v16.2.10' kubectl -n $ROOK_CLUSTER_NAMESPACE patch CephCluster $ROOK_CLUSTER --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}"
-
Wait for the Ceph version to upgrade to the desired version.
watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'
Rechecking Ceph health status¶
It is required to verify the system's state after upgrading to the latest version to ensure it is ready for the next steps. Proceeding without this check could lead to unnecessary work or irreparable harm.
-
Set the Rook cluster namespace.
ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph
-
Check that all pods are running and ready in the specified namespace.
kubectl -n $ROOK_CLUSTER_NAMESPACE get pods
-
Verify that the Ceph is in a healthy state and ensure that it is not in
HEALTH_ERR
state.TOOLS_POD=$(kubectl -n $ROOK_CLUSTER_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[*].metadata.name}') kubectl -n $ROOK_CLUSTER_NAMESPACE exec -it $TOOLS_POD -- ceph status
Keeping Helm scripts consistent¶
This section ensures consistency in Helm scripts, including standard naming conventions, values and templates.
Modifying common.yaml
¶
-
Set the namespaces for the Rook cluster and operator if required.
export ROOK_OPERATOR_NAMESPACE=mdsp-bk-ceph export ROOK_CLUSTER_NAMESPACE=mdsp-bk-ceph
-
Update the
common.yaml
file, replacing the operator and cluster namespace placeholders with the correct values.sed -i.bak -e "s/\(.*\):.*# namespace:operator/\1: $ROOK_OPERATOR_NAMESPACE # namespace:operator/g" -e "s/\(.*\):.*# namespace:cluster/\1: $ROOK_CLUSTER_NAMESPACE # namespace:cluster/g" common.yaml
Modifying operator.yaml
¶
-
Set Rook CSI parameters, modify or add the following parameters to disable or configure the Rook CSI features.
ROOK_CSI_ENABLE_RBD: "false" CSI_ENABLE_RBD_SNAPSHOTTER: "false" ROOK_CSI_ENABLE_CEPHFS: "false" CSI_ENABLE_CEPHFS_SNAPSHOTTER: "false" CSI_PROVISIONER_NODE_AFFINITY: "ceph=true" CSI_PLUGIN_NODE_AFFINITY: "ceph=true" CSI_PROVISIONER_TOLERATIONS: | - effect: "NoExecute" key: "domain" operator: "Equal" value: "ceph" CSI_PLUGIN_NODE_AFFINITY: "ceph=true" CSI_PLUGIN_TOLERATIONS: | - effect: NoExecute key: domain operator: Exists
-
Add node affinity and tolerations to the
rook-ceph-operator
deployment under thetemplate.spec
field.affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "ceph" operator: "In" values: - "true" tolerations: - effect: "NoExecute" key: "domain" operator: "Equal" value: "ceph"
-
Add variables to the
rook-ceph operator
deployment that controls the scheduling nodes of some plugins created by the operator.- name: DISCOVER_TOLERATIONS value: | - effect: NoExecute key: domain operator: Exists - name: DISCOVER_AGENT_NODE_AFFINITY value: "ceph=true"
Modifying cluster.yaml
¶
-
Update the namespace field in
cluster.yaml
to the desired namespace for your Rook cluster. -
Modify the
cluster.yaml
to expose the necessary port for the dashboard to make the Rook Ceph dashboard accessible. -
Schedule all Rook services (such as OSDs, MONs, and OSD prepare) on Kubernetes nodes marked with
ceph=true
.placement: all: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "ceph" operator: In values: - "true" podAffinity: podAntiAffinity: topologySpreadConstraints: tolerations: - key: "domain" operator: "Equal" value: "ceph" effect: "NoExecute"
-
To set specific disks for use as Bluestore OSDs, specify the
deviceFilter
to match the desired disk. Use regular expressions such as^ [s] | v - f [b]
.
Ensure that the selected disk is unmarked and has no filesystem.useAllDevices: false deviceFilter: sdb
Modifying Toolbox
YAML¶
-
Update the namespace in the Toolbox YAML file to match your desired configuration.
-
Modify the Toolbox deployment to include node affinity and tolerations.
affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "ceph" operator: "In" values: - "true" tolerations: - effect: "NoExecute" key: "domain" operator: "Equal" value: "ceph"
Modifying Object
YAML¶
- Update the namespace as per your requirements.
-
Modify the Object placement settings to ensure that it is scheduled on nodes with the
ceph=true
label and includes the necessary tolerations.placement: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "ceph" operator: In values: - "true" tolerations: - effect: "NoExecute" key: "domain" operator: "Equal" value: "ceph"
Modifying rgw-external
YAML¶
- Change the namespace in the
rgw-external
YAML file as per your requirements. - Modify the service type to
NodePort
in thergw-external
YAML.
Except where otherwise noted, content on this site is licensed under the Development License Agreement.