Skip to content

Transforming Data Using Joins

When you join data together from different sources, you must specify which type of join to use. Predictive Learning uses four join types:

Inner Join: returns all rows in which the join condition is met, that is, at least one match occurs in both tables.

Left Join: returns all rows in which the join condition is met, that is, all rows from the left table, and row matches that occur from the right table.

Right Join: returns all rows in which the join condition is met, that is, all rows from the right table, and row matches that occur from the left table.

Full Outer Join: returns all rows in which the join condition is met, that is, when a match occurs in one of the tables.

Diagram of the four join types in Predictive Learning

Left joins and right joins are sometimes referred to as left full join and right full join, respectively. Here is a visual representation of the four join types:

four join types

How to Create a Join Transformation

Join transformations allow you to join two columns from two different datasets to form one new column. The two columns must be the same type, i.e., string-to-string or integer-to-integer. An even number of joins must be specified. A maximum of three joins per transformation panel are allowed.

Follow these steps to create a Join transformation:

  1. On the Join Datasets panel, click Select and choose a dataset from the first field.
  2. Click Select and choose a dataset from the second field.
  3. Select Full, Inner, Left, or Right from the Join Type drop-down list. The Columns to Join On and Columns to Carry Over tabs display.
  4. Click Select and choose the columns to join from each dataset from the **Choose a Column from...**fields. Both columns must be the same data type.
  5. Click the Add Another Join button to add an additional join to the transformation. You can define a maximum of three joins per panel.
  6. Click the Columns to Carry Over tab to specify the columns you want to include in the joined dataset.
  7. If your dataset has a large number of columns, use the Select All option with caution since performance issues may arise. Select at least one column from each dataset, and click the X to delete a column.
  8. Click the Save button at the top of the workspace page to save your transformations.
  9. Click the Run button at the top of the Join Datasets panel to run the transformation. The status message shows the current status of the job.

Last update: January 22, 2024