Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Join (Concurrency)

Synopsis

This Operator joins two ExampleSets using one or more Attributes of the input ExampleSets as key attributes.

Description

This Operator joins two ExampleSets using one or more Attributes of the input ExampleSets as key attributes.

Identical values of the key attributes indicate matching Examples. An Attribute with id role is selected as key by default but an arbitrary set of one or more Attributes can be chosen as key. Four types of joins are possible: inner, left, right and outer join. All these types of joins are explained in the parameters section.

Differentiation

Append

The Append Operator merges the Examples of the input ExampleSets into the resulting ExampleSet. Therefore all input ExampleSet need to have the same structure (number of Attributes, Attribute names and value types).

Cartesian Product

The Cartesian Product Operator builds a cartesian product of the input ExampleSets, i.e. every Example from the left ExampleSet is joined with each Example of the right ExampleSet.

Union

The Union Operator combines both input ExampleSets in such a way that all Attributes and Examples are part of the resulting union ExampleSet.

Superset

The Superset Operator expects two ExampleSets as input and adds the Attributes of the first ExampleSet to the second ExampleSet and vice versa. Both resulting ExampleSets are delivered as output of the Superset Operator.

Input

  • left (Data Table)

    The left input port expects an ExampleSet. This ExampleSet will be used as the left ExampleSet for the join.

  • right (Data Table)

    The right input port expects an ExampleSet. This ExampleSet will be used as the right ExampleSet for the join.

Output

  • join (Data Table)

    The output port delivers the joint ExampleSet.

Parameters

  • remove_double_attributes

    This parameter indicates if double Attributes should be removed or renamed. Double Attributes are those Attributes that are present in both ExampleSets. If this parameter is checked, from Attributes which are present in both ExampleSets only the one from the left ExampleSet will be taken and the one from the right ExampleSet will be discarded. If this parameter is unchecked, the Attributes from the right ExampleSet are renamed. The key attributes will always be taken from the left ExampleSet. Please note that this check for double Attributes will only be applied for regular Attributes. Special Attributes of the right ExampleSet which do not exist in the left ExampleSet will simply be added. If they already exist they are simply skipped.

    Range:
  • join_type

    This parameter specifies which join should be performed. You can easily understand these joins by studying the tutorial Process. Four types of joins are supported:

    • inner: The resulting ExampleSet will contain only those Examples where the key attributes of both input ExampleSets match, i.e. have the same value.
    • left: This is also called left outer join. The resulting ExampleSet will contain all Examples from the left ExampleSet. If no matching Examples were found in the right ExampleSet, then its Attributes will consist of missing values. Missing values or null values are shown as '?' in RapidMiner. The left join will always contain the results of the inner join; however it can contain some Examples that have no matching Examples in the right ExampleSet.
    • right: This is also called right outer join. The resulting ExampleSet will contain all records from the right ExampleSet. If no matching Examples were found in the left ExampleSet, then its Attributes will consist of missing values. Missing values or null values are shown as '?' in RapidMiner. The right join will always contain the results of the inner join; however it can contain some Examples that have no matching Examples in the left ExampleSet.
    • outer: This is also called full outer join. This type of join combines the results of the left and the right join. All Examples from both ExampleSets will be part of the resulting ExampleSet, whether the matching key attribute value exists in the other ExampleSet or not. If no matching key attribute value was found the corresponding resulting Attributes will consist of missing values. Missing values or null values are shown as '?' in RapidMiner. The outer join will always contain the results of the inner join; however it can contain some Examples that have no matching Examples in the other ExampleSet.
    Range:
  • use_id_attribute_as_key

    This parameter indicates if the Attribute with the id role should be used as the key attribute. This option is checked by default. If unchecked, then you have to specify the key attributes for both left and right ExampleSets. Identical values of the key attributes indicate matching Examples.

    Range:
  • key_attributes

    This parameter is available when when the parameter use id attribute as key is unchecked. This parameter specifies Attribute(s) which are used as the key attributes. Identical values of the key attributes indicate matching Examples. For each key attribute from the left ExampleSet a corresponding key attribute from the right ExampleSet has to be chosen. Choosing appropriate key attributes is critical for obtaining the desired results.

    Range:
  • keep_both_join_attributes

    If checked, both Attributes of a join pair will be kept. Usually this is unneccessary since both Attributes are identical. It may be useful to keep such a column if there are missing values on one side.

    Range:

Tutorial Processes

Explore the different join types

After creating two similar ExampleSets which are connected to each port of the Join Operator you can play around with the available join types. The description inside this process explains the created ExampleSets as well as the results of each join type.