Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Connect to your data

To be effective as a data science tool, RapidMiner Studio has to first connect to your data.

  • If the data is in a file on your computer, RapidMiner Studio has to read the file format.
  • If the data is in a database, RapidMiner Studio has to connect to that database, and know the language of that database (SQL / NoSQL).
  • If the data is in the cloud, RapidMiner Studio has to connect to the cloud service and know its API.
  • If the data is imported from or exported to another software tool, for example Python or Tableau, RapidMiner Studio has to know about that tool.
  • If the connection is via a proxy or a self-signed SSL certificate, RapidMiner Studio has to navigate that hurdle.

The good news is that RapidMiner Studio supports a wide range of file formats, databases, cloud services, and other software tools, either natively or via extensions.

Connection Objects

When the connection to your data occurs over a network, you must first create a connection object. A connection object enables the connection to a database, cloud, or email service. All connection objects are stored in a repository, in the Connection Folder Icon Connections subfolder.

From now on, we'll simply call them connections, remembering however that they have similarities to other objects in the repository. You can, for example, drag a database connection into the Process Panel to Retrieve it, before connecting the output to the Read Database Operator.

To create a connection, right-click on the Connection Folder Icon Connections folder, and select Create Connection Icon Create Connection. The Create connection dialog opens, and you can configure your connection. If you're connecting to an SQL database:

  1. Choose the Connection Type (DB Connection Icon Database), Repository (where the connection will be stored) and Connection Name.
  2. Press Create Connection Icon Create and the Edit Connection dialog opens.
  3. Under the Setup tab, select the Database System and fill in User, Password, Host, Port, and (optionally) the Database name.
  4. Press Test Connection Icon Test connection. Once it's working, Save IconSave the connection. The connection will appear in the Connection Folder Icon Connections subfolder of the repository you selected in step (1).

You can view the connection details at any time by double-clicking on the connection in the Repository Panel, or by right-clicking on the connection and choosing Open Icon Open or Open Icon Edit.

Injected parameters: sharing connections

Connection objects can be shared.

Suppose that a group of users has access to the same database, and they collaborate on RapidMiner AI Hub. Can they share the database connection, without sharing their usernames and passwords? The answer is yes!

The solution is to build the connection as a template, where all the common parameters are pre-filled, and all the parameters unique to each user are injected. The values of the injected parameters are not stored in the connection object, but retrieved from an external source every time the connection is used. Possible external sources include macros and secure storage on RapidMiner AI Hub.

To create a connection in a RapidMiner AI Hub repository, or to copy a connection to a RapidMiner AI Hub repository, a user has to belong to the connection manager group. See Sharing and permissions.

In outline, assuming the database credentials will be securely stored on RapidMiner AI Hub, the whole process of using a connection template might proceed as follows. We'll call the user with the connection manager role the admin.

  1. Within RapidMiner Studio, the admin creates a connection in a RapidMiner AI Hub repository. While it's possible to create a connection in a local repository, that connection will only provide macros as an injection source.

  2. While editing the connection, the admin presses the button Inject Parameters Icon Set injected parameters and selects the parameters whose values will be left blank until later (e.g. User and Password). The admin must also choose RapidMiner AI Hub as the source of the injected values.

  3. To set the injected values, a user must connect to the web interface of RapidMiner AI Hub. Either click the link displayed in the Edit connection dialog

    or connect directly to the web interface, then navigate to Repository > Connections, and identify the connection by name. A warning says: This connection has missing values. The user clicks the link, fills in his or her own username and password, and presses the button Save in RapidMiner AI Hub, where the credentials are securely saved. Step (3) needs to be repeated by each individual user.

For more details, read the RapidMiner AI Hub documentation Create connections and Usage and injection.

Macros as a source of injected parameters

Within RapidMiner Studio, using values from process macros for your connection settings is immediately possible. When editing a connection, press Inject Parameters Icon Set injected parameters and choose which parameters should get values from macros. The macro name then needs to match the parameter key to be able to inject that value. The parameter key can be found in the information next to the parameter.

Configuration for the macro source is optional. Without configuring a prefix, the macro name has to match the parameter key. If the prefix for the configuration is given, the macro name has to match the prefix followed by an underscore (_), ending with the parameter key. For the prefix myprefix the parameter key user would require the macro name

myprefix_user

The macro that should be used will be shown when setting injection, as well as in the view and edit dialogs themselves.

Use this for your macro to properly inject it into the connection.

Placeholders

Placeholders can be used inside any configuration parameter's value to reference other parameters. It is possible to concatenate placeholders and free text. Nesting of placeholders is not supported.

Since the syntax for placeholders is the same as for macros, it is important to make the context clear:

  • The context for macros is processes.
  • The context for placeholders is connections.

A placeholder can access parameter values from the current tab as well as from any other tab. To find out the key of a field you want to reference via placeholder in a different field, look at the information tooltip of the original field. The Full key is what you're looking for:

To use this placeholder in another field, simply reference the full key in the other field by surrounding it with a percentage sign (%) and curly brackets ({}), like this:

%{db_config.database}

If a placeholder cannot be resolved, it is simply replaced with an empty string, but still counts as an injected value and will not fail the process execution.

The JDBC based database connections use this mechanism to create the URL from the parameters.

Without parameter information the URL consists of several placeholders and a double colon. By setting the parameters these values are replaced.

Use the placeholder system exactly like this to configure dynamic parameter values.