Normalizing Data With StandardScaler¶
StandardScalar is an out-of-the-box transformation tool for datasets. Scaling, also known as "normalization", helps improve the convergence rate during the optimization process, and also prevents features with very large variances from exerting excessive influence during model training.
It is highly recommended to transform null values before executing a StandardScaler Transformation because this tool ignores columns that contain null values, which can mean a fewer-than-expected number of rows in the resulting dataset.
StandardScaler Formula¶
The formula used by the Predictive Learning Scaler is:
X: Input Column to transform
X_STS: Transformed Column
X_STS = X - mean(X) / std_dev(X)
StandardScaler Transformation Panel Illustration¶
The following illustrates the Parameters tab in StandardScaler.
Columns to Carry Over Tab Illustration¶
This illustrates the StandardScaler Transformation Panel Columns to Carry Over tab.
Review Results Tab Illustration¶
This illustrates the Review Results tab.
How to Access the StandardScaler Transformation Panel¶
Follow these steps to access the StandardScaler Transformation panel:
- Click on Manage Analytics Workspaces on the Predictive Learning menu. The Cluster Configuration page opens.
- Start a cluster if one is not already running. The cluster status changes to RUNNING once the cluster has started.
- Open an existing workspace or create a new one. The Workspace opens.
- Click the Add Transformation Panel button at the top right. The Select Transformation dialog box opens.
- Select StandardScaler. A StandardScaler transformation panel opens.
How to Use the StandardScaler to Transform Your Data¶
Follow these steps to normalize your data using the StandardScaler:
- Click the Select button and choose a dataset from the Select a Dataset field. A list of columns appears in the Columns to Analyze list.
- Select the Define Parameters tab and select With Standard Deviation (default), With Mean, or both. Your selection determines the hyperparameter used in the StandardScaler algorithm.
- Select at least one column to perform the transformation on, or select the Select All check box for transforming all columns. Your selections appear in the Selected Columns list.
- Select the Columns to Carry Over tab and select at least one column to include in the preview, or select the Select All check box to preview all columns.
- Click Run to run the transformation now, or click Save as Dataset. A save dialog opens.
- Accept the default dataset name or enter a new one in the Dataset name field.
- Click the Preview tab. The transformed columns are appended to the end of the results grid with the name
<OriginalColumnName _STS>
. - Click Save as Dataset. The transformed attributes are added to the original dataset and saved to the S3 location you specify.
Except where otherwise noted, content on this site is licensed under the Development License Agreement.