Upgrading Bitnami Kafka from Version 20.0.0 to 28.2.5¶
This section provides the instructions for upgrading Bitnami Kafka from version 20.0.0 to 28.2.5 in a Kubernetes environment. System administrators and DevOps engineers can use this guide as a technical reference for managing and maintaining Kafka clusters.
Resource specifications¶
This table provides key configuration parameters and environment details required for upgrading Bitnami Kafka from version 20.0.0 to 28.2.5 in a Kubernetes cluster using Helm:
Parameter | Value |
---|---|
Kubernetes Version | v1.28 |
Helm Version | v3 |
Release Name | kafka |
Namespace | mdsp-bk-kafka |
Chart | bitnami/kafka |
From Version | 20.0.0 |
To Version | 28.2.5 |
Kafka Replicas | 3 (0, 1, 2) |
Test Topic | test |
Downtime Window | 20 minutes |
Prerequisites¶
The following requirements must be met to proceed with the Bitnami Kafka upgrade.
Pull required Docker images¶
Note
Ensure all required Docker images are pulled in advance.
bitnami/jmx-exporter:0.20.0-debian-12-r17
bitnami/kafka:3.7.0-debian-12-r6
bitnami/kafka-exporter:1.7.0-debian-12-r27
bitnami/zookeeper:3.9.2-debian-12-r6
docker.io/bitnami/jmx-exporter:0.20.0-debian-12-r17
docker.io/bitnami/kafka:3.7.0-debian-12-r6
docker.io/bitnami/kafka-exporter:1.7.0-debian-12-r27
docker.io/bitnami/zookeeper:3.9.2-debian-12-r6
Info
The business will experience approximately 20 minutes of downtime during the upgrade.
Verify Insights Hub GUI Pre-Upgrade¶
Ensure the Insights Hub GUI is functioning properly before starting the upgrade. Verify that there are no pre-existing Kafka-related issues.
Get Kafka release revision¶
In case the Kafka upgrade fails, get the release revision details and roll back the changes.
To get Kafka release revision details, execute the following:
-
If ArgoCD is used, the release revision can be checked as mentioned in the below image.
-
If Helm is used, run the following command to get release revision.
helm -n mdsp-bk-kafka history kafka
1. Stop Kafka access¶
To prevent any impact on the upgrade process, ensure Kafka does not receive data from clients during the upgrade.
NAMESPACE=mdsp-bk-kafka
kubectl -n $NAMESPACE delete svc kafka kafka-zookeeper
2. Retain existing Persistent Volumes (PV)¶
Preserve the PersistentVolumes
(PVs) and set the reclaim policy to "Retain
".
NAMESPACE=mdsp-bk-kafka
rm -f pv_list.txt
for REPLICA in 0 1 2;
do
OLD_PVC="data-kafka-${REPLICA}"
PV_NAME=$(kubectl -n $NAMESPACE get pvc $OLD_PVC -o jsonpath="{.spec.volumeName}")
# Store old volume name to pv_list.txt file
echo $PV_NAME >> pv_list.txt
# Modify PV reclaim policy
kubectl -n $NAMESPACE patch pv $PV_NAME -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'
done
# check if pv retained
kubectl get pv|grep $NAMESPACE
3. Generate new PersistentVolumeClaim (PVC) manifest¶
Create YAML manifest
for new PVCs.
NAMESPACE=mdsp-bk-kafka
for REPLICA in 0 1 2
do
OLD_PVC="data-kafka-${REPLICA}"
NEW_PVC="data-kafka-broker-${REPLICA}"
NEW_PVC_MANIFEST_FILE="$NEW_PVC.yaml"
# Create new PVC manifest
kubectl -n $NAMESPACE get pvc $OLD_PVC -o json | jq ".metadata.name = \"$NEW_PVC\"|with_entries(select([.key] |inside([\"metadata\", \"spec\", \"apiVersion\", \"kind\"]))) | del(.metadata.annotations, .metadata.creationTimestamp,.metadata.finalizers, .metadata.resourceVersion,.metadata.selfLink, .metadata.uid)"> $NEW_PVC_MANIFEST_FILE
done
# check if they exist and content correct
ls -l data-kafka-broker-*.yaml;cat data-kafka-broker-*.yaml
4. Delete Statefulset and old PVCs¶
Delete old Kafka statefulset and PVCs.
NAMESPACE=mdsp-bk-kafka
kubectl -n $NAMESPACE delete sts "kafka"
# check if pod deleted , it will deleted from now on
kubectl -n $NAMESPACE get pod
for REPLICA in 0 1 2;
do
kubectl -n $NAMESPACE delete pvc data-kafka-${REPLICA}
done
# check if pvc deleted, it will disapeared from now on
kubectl -n $NAMESPACE get pvc
# check if pv exists, it will still exists from now on
kubectl get pv|grep $NAMESPACE
5. Re-enable Persistent Volumes (PVs) and create a new PVC¶
Verify detachment and prepare PVs for reuse.
NAMESPACE=mdsp-bk-kafka
for PV_NAME in `cat /tmp/pv_list.txt`;do
echo $PV_NAME
kubectl -n $NAMESPACE patch pv $PV_NAME -p '{"spec":{"claimRef": null}}'
done
# after patched,pv will missed in the pv list while excute get command
kubectl get pv|grep $NAMESPACE
# but volumes still exist
for PV_NAME in `cat /tmp/pv_list.txt`;do
kubectl get volumes.longhorn.io -A|grep $PV_NAME
done
# if not detached , you need wait it detached or detach it manully from page or cmd. kubectl -n longhorn-system get volumeattachments.storage.k8s.io |egrep 'pvc-b876d645-9438-4c65-ad35-a061fe5e4830|pvc-b38cf148-aaf5-4f59-b801-84e15cbb43fa|pvc-0daa9f83-0c65-4398-9d40-3b4621f4aecf' ,also need delete attachments for specified volume
6. Create new PVC using existing PV¶
Create new PVC using existing PV.
NAMESPACE=mdsp-bk-kafka
for REPLICA in 0 1 2
do
kubectl -n $NAMESPACE apply -f data-kafka-broker-$REPLICA.yaml
done
# get new pvc
kubectl -n $NAMESPACE get pvc
# get pv
kubectl get pv|grep $NAMESPACE
7. Upgrade Kafka¶
Update the Kafka values in the values.yaml
file, including additional parameters that need to be modified, such as size
, extraConfig
, authentication protocol
, user
, password
etc., and then proceed with the upgrade process.
Name | Value |
---|---|
global.storageClass | "longhorn-ssd" |
extraConfig | log.dirs=/bitnami/kafka/data delete.topic.enable=false auto.create.topics.enable=true num.recovery.threads.per.data.dir=1 allow.everyone.if.no.acl.found=true super.users=User:admin |
heapOpts | -Xmx2048m -Xms2048m |
listeners.client.protocol | PLAINTEXT |
listeners.controller.protocol | PLAINTEXT |
listeners.interbroker.protocol | PLAINTEXT |
listeners.external.protocol | PLAINTEXT |
sasl.interbroker.user | admin |
sasl.client.users | ["user"] |
controller.replicaCount | 0 |
broker.replicaCount | 3 |
broker.minId | 0 |
broker.resources.limits | {"cpu": "1","memory": "8.5Gi"} |
broker.resources.requests | {"cpu": "250m","memory": "2.5Gi"} |
broker.affinity | nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "iaas" operator: "In" values: - "true" |
broker.tolerations | - effect: "NoSchedule" key: "domain" operator: "Equal" value: "iaas" |
broker.persistence.size | 2Ti |
metrics.kafka.enabled | true |
metrics.kafka.resources | limits: cpu: 200m memory: 256Mi requests: cpu: 100m memory: 128Mi |
metrics.kafka.affinity | nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "iaas" operator: "In" values: - "true" |
metrics.kafka.tolerations | - effect: "NoSchedule" key: "domain" operator: "Equal" value: "iaas" |
metrics.jmx.enabled | true |
metrics.jmx.resources | limits: cpu: 200m memory: 256Mi requests: cpu: 100m memory: 128Mi |
metrics.serviceMonitor.enabled | true |
metrics.serviceMonitor.namespace | monitoring |
kraft.enabled | false |
zookeeper.enabled | true |
zookeeper.replicaCount | 3 |
zookeeper.affinity | nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "iaas" operator: "In" values: - "true" |
zookeeper.tolerations | - effect: "NoSchedule" key: "domain" operator: "Equal" value: "iaas" |
zookeeper.resources | requests: memory: 256Mi cpu: 250m |
zookeeper.persistence.size | 5Gi |
networkPolicy.enabled | false |
Info
When upgrading with ArgoCD, you need to rename the Kafka repository and select the "PRUNE" option to remove old versions of resources, such as services
(svc).
NAMESPACE=mdsp-bk-kafka
cd kafka-28.2.5/
helm -n $NAMESPACE upgrade kafka ./ -f values.yaml
# if error,do scale sts of kafka and zookeeper step by step. and delete sts of zookeeper.
# kubectl -n longhorn-system get volumeattachments.storage.k8s.io |egrep 'pvc-b876d645-9438-4c65-ad35-a061fe5e4830|pvc-b38cf148-aaf5-4f59-b801-84e15cbb43fa|pvc-0daa9f83-0c65-4398-9d40-3b4621f4aecf' # It is possible to check if it is already attached
# see new pods up
kubectl -n $NAMESPACE get pod
8. Validate Kafka data¶
Test consuming messages from the Kafka topic test to verify if any data is lost. Assume that the test topic contains messages.
NAMESPACE=mdsp-bk-kafka
kubectl -n $NAMESPACE exec -it kafka-client bash
>> NAMESPACE=mdsp-bk-kafka;cd /opt/bitnami/kafka/bin/
>> kafka-console-consumer.sh --bootstrap-server kafka.$NAMESPACE.svc.cluster.local:9092 --topic test --from-beginning
9. Restart all services¶
- Restart all services that are using Kafka. You can use the following command to identify which deployments are utilizing Kafka:
cat alldeploy.json | jq -c '.items[]' | while IFS= read -r deployment; do
namespace=$(echo "$deployment" | jq -r '.metadata.namespace')
name=$(echo "$deployment" | jq -r '.metadata.name')
# Check if the namespace starts with domain beginning or kafka-connector
if [[ "$namespace" == domain* || "$namespace" == "kafka-connector" ]]; then
# Check if it exists secretKeyRef.key amount kafka_host、kafka_host_port or kafka_host_port_ha...
if echo "$deployment" | jq -e '.spec.template.spec.containers[].env[]? | select(.valueFrom.secretKeyRef.key == "kafka_host" or .valueFrom.secretKeyRef.key == "kafka_host_port" or .valueFrom.secretKeyRef.key == "kafka_host_port_ha" or .valueFrom.secretKeyRef.key == "kafka_port" or .valueFrom.secretKeyRef.key == "kafka_zookeeper_host" or .valueFrom.secretKeyRef.key == "kafka_zookeeper_host_ha")' > /dev/null; then
# The output is eligible Deployment Information
#echo "Namespace: $namespace, Deployment Name: $name"
echo "kubectl -n $namespace rollout restart deployment $name"
fi
fi
done
- Manually restart core services:
# kafka connector
kubectl -n kafka-connector rollout restart deploy kafka-connector-job-cp-kafka-connect || kubectl -n kafka-connector rollout restart deployment confluent-cp-kafka-connect
# core
kubectl -n mindsphere-core rollout restart deploy hypergate
kubectl -n mindsphere-core rollout restart deploy hypergate-proxy
kubectl -n mindsphere-core rollout restart deploy mindgate
kubectl -n mindsphere-core rollout restart deploy mindgate-oscloud
# iot
kubectl -n mindsphere-iots rollout restart deploy idl-access-token-svc
kubectl -n mindsphere-iots rollout restart deploy idl-metadata-svc
kubectl -n mindsphere-iots rollout restart deploy idl-notification-listener
kubectl -n mindsphere-iots rollout restart deploy iot-cts-aggregate-svc
kubectl -n mindsphere-iots rollout restart deploy iot-cts-coldstore-jobs
kubectl -n mindsphere-iots rollout restart deploy iot-cts-data-ingest
kubectl -n mindsphere-iots rollout restart deploy iot-cts-data-svc
kubectl -n mindsphere-iots rollout restart deploy iot-cts-iav-writer
kubectl -n mindsphere-iots rollout restart deploy iot-cts-throttling-consumer
kubectl -n mindsphere-iots rollout restart deploy iot-cts-writer
kubectl -n mindsphere-iots rollout restart deploy iot-ts-billing-ingestion-size-extractor
kubectl -n mindsphere-iots rollout restart deploy iot-ts-streaming-svc
kubectl -n mindsphere-iots rollout restart deploy iot-ts-subscription-writer
kubectl -n mindsphere-strt rollout restart deploy energy-prediction-ts-aggregator
kubectl -n mindsphere-strt rollout restart deploy ep-agg-worker
# advs
kubectl -n mindsphere-advs rollout restart deploy assetmanagement
kubectl -n mindsphere-advs rollout restart deploy assettenantservice
kubectl -n mindsphere-advs rollout restart deploy assettypemanagement
kubectl -n mindsphere-advs rollout restart deploy eventmanagement
kubectl -n mindsphere-advs rollout restart deploy eventmanagement-entity-manager
# conn
kubectl -n mindsphere-conn rollout restart deploy agentonlinedetector
kubectl -n mindsphere-conn rollout restart deploy customparserproxy
kubectl -n mindsphere-conn rollout restart deploy datasourceconfigurationparser
kubectl -n mindsphere-conn rollout restart deploy eventparser
kubectl -n mindsphere-conn rollout restart deploy exchange
kubectl -n mindsphere-conn rollout restart deploy fileparser
kubectl -n mindsphere-conn rollout restart deploy messagerouter
kubectl -n mindsphere-conn rollout restart deploy recordrecoveryservice
kubectl -n mindsphere-conn rollout restart deploy timeseriesparser
# uts
kubectl -n mindsphere-core rollout restart deploy coremasterscheduler-athenards
kubectl -n mindsphere-core rollout restart deploy coremasterscheduler-athenardsnb
kubectl -n mindsphere-core rollout restart deploy coremasterscheduler-general
kubectl -n mindsphere-core rollout restart deploy coremasterscheduler-kpi
kubectl -n mindsphere-core rollout restart deploy coremasterscheduler-metering
kubectl -n mindsphere-core rollout restart deploy coremasterscheduler-partition
kubectl -n mindsphere-core rollout restart deploy corereportservice
kubectl -n mindsphere-core rollout restart deploy utsreportservice
After the upgrade process is complete, verify the Insights Hub GUI to check for any Kafka-related issues and verify that the Kafka services are running correctly.