Validating OSS Cluster¶
This section details a series of tests and checks to validate various components of the OSS and cluster. It explains the steps to deploy the validation tool, including configuration for one-time or scheduled execution.
Access the validation report at: http://node_ip:31081/report.html?sort=result
Prerequisites¶
The following requirements must be fulfilled to validate the service:
-
Secrets configuration: the following secrets are deployed under the appropriate namespace:
mdsp-secret-infra-credentials
mdsp-registry-secret
-
Configure
kubeconfig
in ArgoCD: Set up thekubeconfig
in ArgoCD to interact with the Kubernetes cluster.
Descriptions¶
Each test case checks specific components (e.g., pods
, nodes
, services
) for health, status or connectivity. The tests cover required services such as Ceph
, Kubernetes
, Spark
, RabbitMQ
, Redis
, etc.
A list of test cases used to validate various components of the OSS/cluster:
Test Name | Description |
---|---|
test_pod_check | Check if all pods are healthy |
test_nodes_check | Check access to mdsp-psql-all PGO cluster |
test_pgo_all_status_check | Check if PVC usage on each PGO node exceeds 80% |
test_pgo_iot_status_check | Check access to mdsp-psql-iot PGO cluster |
test_pgo_all_disk_check | Check PVC usage in all PGO cluster pods, fails if usage > 80% |
test_pgo_iot_disk_check | Check PVC usage in IoT PGO cluster pods, fails if > 80% |
test_opensearch_check | Check if OpenSearch is reachable |
test_spark_check | Check if Spark job can be submitted and executed successfully |
test_kubernetes_check | Check if Kubernetes is in a healthy state |
test_ceph_check | Check if Ceph server is healthy |
test_ceph_osd_usage_check | Check if Ceph OSD disk usage exceeds 80% |
test_ceph_svc_internal_endpoint_check | Check if all Ceph service internal cluster domains & ports are reachable |
test_harbor_svc_internal_endpoint_check | Check if all Harbor service internal cluster domains & ports are reachable |
test_kafka_svc_internal_endpoint_check | Check if all Kafka service internal cluster domains & ports are reachable |
test_mqtt_svc_internal_endpoint_check | Check if all MQTT service internal cluster domains & ports are reachable |
test_opensearch_svc_internal_endpoint_check | Check if all OpenSearch service internal cluster domains & ports are reachable |
test_rabbitmq_svc_internal_endpoint_check | Check if RabbitMQ service internal cluster domains & ports are reachable |
test_redis_svc_internal_endpoint_check | Check if Redis service internal cluster domains & ports are reachable |
test_spark_svc_internal_endpoint_check | Check if Spark service internal cluster domains & ports are reachable |
test_pgo_svc_internal_endpoint_check | Check if PGO service internal cluster domains & ports are reachable |
test_rabbitmq_check | Check if RabbitMQ service is healthy |
test_mqtt_check | Check if MQTT service is healthy |
test_external_domains_in_dns_check | Check if all external domains are resolvable and reachable |
test_arango_status_check | Check if all ArangoDB components are "GOOD" |
test_redis_status_check | Check if Redis can connect with the password |
test_tableau_status_check | Check if Tableau server can be signed in successfully |
test_harbor_status_check | Check if Harbor server can be signed in successfully |
test_adfs_status_check | Check if ADFS server is active |
test_mdsp_external_certs_check | Check if external domain certificates for Insights Hub are valid (not expiring in 30 days) |
test_harbor_external_certs_check | Check if Harbor certificates are valid (not expiring in 30 days) |
test_mqtt_external_certs_check | Check if RabbitMQ-MQTT certificates are valid (not expiring in 30 days) |
test_adfs_communicate_certs_check | Check if ADFS certificates are valid (not expiring in 30 days) |
test_kong_license_check | Check if Kong license is valid (not expiring in 30 days) |
Deploying the Service¶
To deploy the tools based on your requirement, follow the steps:
-
To run the job in schedule, enable the cron job and configure the schedule.
cronjob: enabled: true schedule: "0 0 * * *"
-
Disable the cron job to deploy the tool as a one-time deployment resource.
cronjob: enabled: false
-
Configure the specific test cases of your choice to execute.
envs: test_case_option: "test_"
-
To run the specific test cases named
test_spark_check
andtest_pod_check
, use "test_spark_check
" or "test_pod_check
". -
To run all other test cases exclude
test_pod_check
, usenot test_pod_check
- To run all test cases: "test_", which is default value.
-
Enable email notifications for the validation results.
global: notifyEnable: true
Except where otherwise noted, content on this site is licensed under the Development License Agreement.