Skip to content

Normalizing Data With StandardScaler

StandardScalar is an out-of-the-box transformation tool for datasets. Scaling, also known as "normalization", helps improve the convergence rate during the optimization process, and also prevents features with very large variances from exerting excessive influence during model training.

It is highly recommended to transform null values before executing a StandardScaler Transformation because this tool ignores columns that contain null values, which can mean a fewer-than-expected number of rows in the resulting dataset.

StandardScaler Formula

The formula used by the Predictive Learning Scaler is:

X: Input Column to transform

X_STS: Transformed Column

X_STS = X - mean(X) / std_dev(X)

StandardScaler Transformation Panel Illustration

The following illustrates the Parameters tab in StandardScaler.

StandardScaler

Columns to Carry Over Tab Illustration

This illustrates the StandardScaler Transformation Panel Columns to Carry Over tab.

Carry Over Tab

Review Results Tab Illustration

This illustrates the Review Results tab.

Review Results Tab Illustration

How to Access the StandardScaler Transformation Panel

Follow these steps to access the StandardScaler Transformation panel:

  1. Click on Manage Analytics Workspaces on the Predictive Learning menu. The Cluster Configuration page opens.
  2. Start a cluster if one is not already running. The cluster status changes to RUNNING once the cluster has started.
  3. Open an existing workspace or create a new one. The Workspace opens.
  4. Click the Add Transformation Panel button at the top right. The Select Transformation dialog box opens.
  5. Select StandardScaler. A StandardScaler transformation panel opens.

How to Use the StandardScaler to Transform Your Data

Follow these steps to normalize your data using the StandardScaler:

  1. Click the Select button and choose a dataset from the Select a Dataset field. A list of columns appears in the Columns to Analyze list.
  2. Select the Define Parameters tab and select With Standard Deviation (default), With Mean, or both. Your selection determines the hyperparameter used in the StandardScaler algorithm.
  3. Select at least one column to perform the transformation on, or select the Select All check box for transforming all columns. Your selections appear in the Selected Columns list.
  4. Select the Columns to Carry Over tab and select at least one column to include in the preview, or select the Select All check box to preview all columns.
  5. Click Run to run the transformation now, or click Save as Dataset. A save dialog opens.
  6. Accept the default dataset name or enter a new one in the Dataset name field.
  7. Click the Preview tab. The transformed columns are appended to the end of the results grid with the name <OriginalColumnName _STS>.
  8. Click Save as Dataset. The transformed attributes are added to the original dataset and saved to the S3 location you specify.

Last update: January 22, 2024