Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.0 - Check here for latest version

Using the Solr Connector

The Solr connector allows you to read search results from a Solr server. Using the Search Solr operator you can run different search queries. This document will walk you through how to:

Install the Solr extension

First, you need to install the Solr Extension:

Connect to Your Solr Server

Before you can use the Solr connector, you have to configure a new Solr connection. For this purpose, you will need the connection details of your Solr server. Typically, the Solr server URL ends with the string '/solr'. If your Solr server requires authentication, you will also need valid credentials.

  1. In RapidMiner Studio, right-click on the repository you want to store your Solr connection in and choose New Connection Icon Create Connection.

    You can also click on Connections > New Connection Icon Create Connection and select the repository from the dropdown of the following dialog.

  2. Enter a name for the new connection, and set Connection Type to Solr Icon Solr:

  3. Click on Create IconCreate and switch to the Setup tab in the Edit connection dialog.

  4. Fill in the connection details of your Solr server:

    The preconfigured URL is the default URL for a Solr server running on your local machine. Note that Solr does not require user authentication by default but you can specify the username and password by selecting Uses authentication.

    While not required, we recommend testing your new Solr connection by clicking the Connection Test IconTest connection button. If the test fails, please check whether the details are correct.

  5. Click Save IconSave to save your connection and close the Edit connection dialog.

You can now use the newly created connection with the Solr operators!

Search your Solr server

There are two searching operators for Solr, Search Solr (Data) and Search Solr (Documents). The Search Solr (Data) operator allows to query Solr servers and obtain the results as a data table. The Search Solr (Documents) operator works similar but supplies the data as a collection of documents that can be processed further with the Text extension. We will demonstrate the configuration for the Search Solr (Data) operator, it can also be applied to Search Solr (Documents).

  1. Open a new process New Process Icon in RapidMiner Studio, drag the Search Solr (Data) operator into the Process view, and connect its output port to the result port of the process: Select your Solr connection for the connection entry parameter from the connections folder of the repository you stored it in by clicking on the repository chooser icon button next to it:

    Alternatively, you can drag the Solr connection from the repository into the Process Panel and connect the resulting operator with the Read Solr operator.

  2. Select a collection from the list of the collection parameter.

  3. Define the search query by clicking on the button next to the query parameter. You can add filters to refine your query. If there is no parameter filter query visible click on Show advanced parameters to display it.

  4. Optionally, you can specify advanced parameters like data facets for a faceted search. Note that you can change the default limit of 100 for the maximal number of results.

  5. Run Run Process the process! In the Result Perspective, you can see the table resulting from your query. The Solr collection fields are now the columns and every row comes from a Solr entry.

Follow the same steps to use the Search Solr (Documents) operator. After specifying the collection and the query you can select the document body field. This parameter specifies which Solr field will be stored in the RapidMiner document body. The other Solr fields become meta data records of the document.

Now every Solr entry is transformed into a Document instead of a row as for the Search Solr (Data) operator.

Add to your Solr server

As for searching Solr there are two operators to add to Solr. The Add to Solr (Data) uploads the content of a data table to the Solr server. The Add to Solr (Documents) operator works similar but expects the input as a collection of documents that come from the Text extension.

We will demonstrate the configuration for the Add to Solr (Data) operator, it can also be applied to Add to Solr (Documents).

  1. Open a new process New Process Icon in RapidMiner Studio, drag the Add to Solr (Data) operator into the Process view, and specify a connection as described above.

  2. Select a collection from the list of the collection parameter.

  3. Connect the input port of the operator with the data table that should be added. Every column will become a Solr field and every row a Solr entry for the respective fields.

The Add to Solr (Documents) operator works exactly the same just with a collection of Documents as input. The metadata records of the Documents consist of key and related value. The keys will become Solr fields and one Document will specify a Solr entry with the related values. As Documents haven an additional body, you can specify the Solr field for this via the parameter document body field.