Principal Component Analysis (PCA) - Developer Documentation

Principal Component Analysis (PCA)¶

Principal Component Analysis (PCA) is a statistical procedure. It uses an orthogonal transformation to convert a set of observations (of possibly correlated variables) into a set of values of linearly uncorrelated variables called principal components. It finds a rotation in which the first coordinate has the largest variance possible and each succeeding coordinate, in turn, has the largest variance possible. The columns of the rotation matrix are called principal components.

PCA is widely used in dimensionality reduction. After finding the new coordinates, it is possible to choose fewer dimensions (where variation happens most) in the new coordinate system and project that data back to the reduced dimension coordinate system, while preserving most of the information about the data.

Principal Component Analysis Panel Illustration¶

The following illustrates the Principal Component Analysis panel on the Predictive Workspace.

How to Use the Principal Component Analysis Transformation¶

Follow these steps to use the Principal Component Analysis transformation:

1. Access the Manage Analytics Workspaces page. The Manage Analytics Workspaces page appears.
2. Select a cluster configuration and start a cluster. The cluster starts and the cluster status message changes to Running.
3. Click Create a New Workspace or open an existing workspace. The Workspace opens on a new page.
4. Click Add Transformation Panel. The Select Transformation dialog box opens.
5. Select Principal Component Analysis from the list and click the Select button. The Principal Component Analysis transformation panel displays.
6. Click the Select button next to the Select a Dataset field and choose a dataset from the list.
7. Select the columns you want to include from the Select Columns to Analyze list, or click Select All. The columns you select appear on the right, and the Total Number of Dimensions reflects your selections.
8. Enter the number of columns you want to reduce the dataset to in the Reduce Number of Dimensions field.
9. Specify the columns you want to show on the Columns to Carry Over tab.
10. Click Run on the panel title bar. The transformation runs and the new dataset displays.

Last update: June 15, 2023