Skip to content

Validating OSS Cluster

This section details a series of tests and checks to validate various components of the OSS and cluster. It explains the steps to deploy the validation tool, including configuration for one-time or scheduled execution.

Access the validation report at: http://node_ip:31081/report.html?sort=result

Prerequisites

The following requirements must be fulfilled to validate the service:

  1. Secrets configuration: the following secrets are deployed under the appropriate namespace:

    mdsp-secret-infra-credentials mdsp-registry-secret

  2. Configure kubeconfig in ArgoCD: Set up the kubeconfig in ArgoCD to interact with the Kubernetes cluster.

Descriptions

Each test case checks specific components (e.g., pods, nodes, services) for health, status or connectivity. The tests cover required services such as Ceph, Kubernetes, Spark, RabbitMQ, Redis, etc.

A list of test cases used to validate various components of the OSS/cluster:

Test Name Description
test_pod_check Check if all pods are healthy
test_nodes_check Check access to mdsp-psql-all PGO cluster
test_pgo_all_status_check Check if PVC usage on each PGO node exceeds 80%
test_pgo_iot_status_check Check access to mdsp-psql-iot PGO cluster
test_pgo_all_disk_check Check PVC usage in all PGO cluster pods, fails if usage > 80%
test_pgo_iot_disk_check Check PVC usage in IoT PGO cluster pods, fails if > 80%
test_opensearch_check Check if OpenSearch is reachable
test_spark_check Check if Spark job can be submitted and executed successfully
test_kubernetes_check Check if Kubernetes is in a healthy state
test_ceph_check Check if Ceph server is healthy
test_ceph_osd_usage_check Check if Ceph OSD disk usage exceeds 80%
test_ceph_svc_internal_endpoint_check Check if all Ceph service internal cluster domains & ports are reachable
test_harbor_svc_internal_endpoint_check Check if all Harbor service internal cluster domains & ports are reachable
test_kafka_svc_internal_endpoint_check Check if all Kafka service internal cluster domains & ports are reachable
test_mqtt_svc_internal_endpoint_check Check if all MQTT service internal cluster domains & ports are reachable
test_opensearch_svc_internal_endpoint_check Check if all OpenSearch service internal cluster domains & ports are reachable
test_rabbitmq_svc_internal_endpoint_check Check if RabbitMQ service internal cluster domains & ports are reachable
test_redis_svc_internal_endpoint_check Check if Redis service internal cluster domains & ports are reachable
test_spark_svc_internal_endpoint_check Check if Spark service internal cluster domains & ports are reachable
test_pgo_svc_internal_endpoint_check Check if PGO service internal cluster domains & ports are reachable
test_rabbitmq_check Check if RabbitMQ service is healthy
test_mqtt_check Check if MQTT service is healthy
test_external_domains_in_dns_check Check if all external domains are resolvable and reachable
test_arango_status_check Check if all ArangoDB components are "GOOD"
test_redis_status_check Check if Redis can connect with the password
test_tableau_status_check Check if Tableau server can be signed in successfully
test_harbor_status_check Check if Harbor server can be signed in successfully
test_adfs_status_check Check if ADFS server is active
test_mdsp_external_certs_check Check if external domain certificates for Insights Hub are valid (not expiring in 30 days)
test_harbor_external_certs_check Check if Harbor certificates are valid (not expiring in 30 days)
test_mqtt_external_certs_check Check if RabbitMQ-MQTT certificates are valid (not expiring in 30 days)
test_adfs_communicate_certs_check Check if ADFS certificates are valid (not expiring in 30 days)
test_kong_license_check Check if Kong license is valid (not expiring in 30 days)

Deploying the Service

To deploy the tools based on your requirement, follow the steps:

  1. To run the job in schedule, enable the cron job and configure the schedule.

    cronjob:
      enabled: true
      schedule: "0 0 * * *"
    
  2. Disable the cron job to deploy the tool as a one-time deployment resource.

    cronjob:
      enabled: false
    
  3. Configure the specific test cases of your choice to execute.

    envs:
      test_case_option: "test_"
    
  4. To run the specific test cases named test_spark_check and test_pod_check, use "test_spark_check" or "test_pod_check".

  5. To run all other test cases exclude test_pod_check, use not test_pod_check

  6. To run all test cases: "test_", which is default value.
  7. Enable email notifications for the validation results.

    global:
      notifyEnable: true
    

Last update: January 31, 2025

Except where otherwise noted, content on this site is licensed under the Development License Agreement.