Evaluate the model training results¶
Once the model training is completed, the training results becomes available after a brief period of time. The status of the model changes to "Training Completed". To evaluate the model training results, click "View training results". The Training board window is displayed. Navigate through the tabs to evaluate the training results.
Insights Hub Quality Prediction Check Data Video
The Model accuracy metrics are represented by a set of characteristics to quantify the model accuracy for the training and testing data sets. The application uses 75% of the overall training data to train the model and remaining 25% to test the model accuracy.
Evaluate the training results with the Model accuracy metrics Regression score(R2), Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) for the training and testing the data sets.
The R2 accuracy score is a metric that quantifies the correlation between predicted and actual results. The values for the test dataset should be as high as possible, approaching 100% in the ideal case. The root mean square error and mean absolute error describing deviations of the predictions and true results should be as low as possible.
Review the comparison of predicted and true results for the training data set using the following charts:
- Predicted vs. actual (Time Series) chart: Displays the predicted and actual quality results over time.
- Predicted vs. actual (Correlation) chart: Visualizes the correlation between predicted and actual results, with each data point ideally aligning closely with the green line representing a perfect fit.
- Delta Prediction vs. actual (Deviations Histogram): Displays the distribution of deviations between the predicted and actual results. A well-trained model leads to acceptable level of prediction deviation and actual results without critical outliers.
If the accuracy metrics is not sufficient or the model generates too many outliers, click “Edit" to adjust the model setting and click "Save as new version" to save the model once the model accuracy and the visual results are acceptable. You can edit and save as new version only in the Draft status.
Analyze the feature importance to identify the process parameters that significantly impact the predicted quality result and review the feature importance charts as shown in the image.
-
Importance chart: Displays the feature importance obtained by permutation of the input features.
-
Feature importance (SHAP-values) chart: Displays the SHAP values for each input feature, indicating how a single feature affects the prediction result. Process parameters with the highest importance and SHAP values have the significant impact on the quality result. For more information refer Evaluation of feature importance.
-
Feature effects (SHAP-values) chart: Displays the distribution of SHAP values for each data point in the training dataset. The color of each data point represents the feature value and its position on the axis indicates its positive or negative impact on the predicted result.
Explainability analyses the impact of the input features on the prediction results. In this tab, Parallel Coordinate Plot and 3D Process Charts are displayed. These interactive graphs enable user to analyze the dependencies between input features and prediction results allowing for optimal quality result determination.
-
Parallel coordinate plot: The chart allows immediate assessment of the model's prediction accuracy and displays the comparison of actual quality parameter values with model-generated predictions for the training dataset.
- The two right-side axis display the actual quality parameter values and the model's predictions.
- Left-Side axis displays the values of the top five input features that have the highest importance as determined by the model.
- Line color represents quality parameter value with yellow indicating high values and blue indicating low values.
- Horizontal line segments connecting values of the prediction and true results (two axes on the right side) allow immediate assessment of prediction accuracy indicating close alignment between predictions and true results, while disparities reveal inaccuracies.
- It also helps identify input feature ranges associated with the highest and lowest quality parameter values.
-
3D Process chart
- Displays training dataset points along three axes representing the top three important input features.
- Data point color indicates quality result values.
- Interactive features like magnification and rotation aid navigation helping the user to focus on specific point of interest and identify the locations of highest or lowest values.
The interactive features of the parallel coordinate plot enable users to select datasets based on quality parameters within a defined range. By selecting a range on the right axis, which represents the target value, user can explore the machine settings and input parameters necessary to achieve the target quality within a specified tolerance. This functionality aids in identifying optimal machine configurations for manufacturing products that meet high, low or the midpoint tolerance levels of quality. Similarly, selecting the range for the process feature axis allows the user to visualize the corresponding quality results achievable with specific machine settings.
Additionally, by right-clicking on the relevant axis and using the interactive menu, you can analyze the data through a dependence plot and a distribution histogram, as demonstrated in the example below.
Review the process features, filtered features, aggregated features and the quality data charts in the Input Data tab.
Proceed with the analysis of the process parameters and quality results used for model training by analysing the timeseries charts. These charts visualize the process and quality data utilized to train the model. The charts are interactive allowing for magnification and navigation for detailed analysis of the data points.
Analyze the distributions and correlation between the process features and quality result in the data analysis tab.
Review the distribution histograms of the process features, training inputs (represented by aggregated and additional features) and the correlation heatmap showing the correlation factors for each pair of parameters.
If substantial outliers or unexpected parameter distributions are detected in the distribution charts, it is recommended to edit the model draft or create a new version to remove outliers by adjusting the corresponding settings in the model setup.
In Configuration tab, view the configuration parameters of the ML model.
The complete set of model parameters can be downloaded as a file using the export function.