You are viewing the RapidMiner Radoop documentation for version 9.10 - Check here for latest version
What's new in RapidMiner Radoop 9.10
Radoop now supports RapidMiner Studio's connection framework, making collaboration and deployment of Radoop processes a breeze. We also added some quality of life improvements for connection testing and troubleshooting.
New Radoop connections
Radoop connections can now be exported as connection objects and saved in a project or repository.
This one way export enables connection managers and administrators to create, validate, then share a connection with all users of a project or an AI Hub repository.
Due to the complexity of creating/importing a Radoop connection using a Cluster Manager, we decided to keep the existing Manage Radoop Connections interface for administrators and connection managers. The expected workflow is that administrators use the Manage Radoop Connections interface to create, configure and test the Radoop connections, then export them to all the projects and repositories where users will need it. Users can then take it from there using the new Radoop connection objects to create Radoop processes and execute them on Hadoop clusters.
Radoop Connection Test operator
Run your Radoop connection tests and log collection straight from a RapidMiner process, making connection troubleshooting very easy in RapidMiner AI Hub executions.
Radoop was always equipped with extensive connection testing functionality in RapidMiner Studio, as we know that getting a Radoop connection to work properly is the hardest part of the Radoop journey. We have now extended this convenience also to AI Hub executions by adding the Radoop Connection Test operator. You can select the right test set, and can conveniently write connection test logs into your project or repository for easy retrieval even in secure AI Hub environments.
Cluster-side custom Python environments in Spark script execution
Running Python code in a Spark environment is a very powerful tool, especially when combined with Radoop's code-free approach. It is very important that these Python scripts run with their own specific set of dependencies, with the correct version, to ensure they work, and they produce the expected output. This becomes especially tricky when multiple users work on the same Spark cluster and each of them want to run Python using their own set of dependencies.
In this release, we made sure it is quite easy to run Python code on Spark using the Spark Script operator that uses an isolated set of dependencies, that can be controlled on Radoop Nest, or Spark Script operator level. It involves packaging and uploading the whole Python environment (likely created on a cluster edge node using Anaconda) and uploading it to HDFS, then referencing it with the right advanced parameters for execution. Check out the Spark Script operator description for more details.