You are viewing the RapidMiner Radoop documentation for version 7.6 - Check here for latest version
Enhancements and bug fixes
The following improvements are part of RapidMiner Radoop 7.6.
Enhancements
- Added support for both Standard and Premium Azure HDInsight 3.5. It is recommended to use the Import from Cluster Manager option to create the connection from Ambari directly
- Container reuse is now supported on Hive-on-Tez besides Hive-on-Spark
- HiveServer2 High Availability (using ZooKeeper's service discovery) is now supported
- Dynamic Container Pool size now adapts to changing cluster size
- Hadoop client libraries are upgraded to 2.8.1
- SparkRM and Single Process Pushdown now also logs RapidMiner initialization, so issues with e.g. extensions can be investigated
- After a connection Import from Cluster Manager, the JDBC URL Postfix is now populated with the necessary value, if HiveServer2 transport mode is set to http
- Spark job test now reports if the remote Spark Assembly is incompatible with the chosen Apache Spark version (relevant for CDH 5.11, 5.12 and potentially other versions)
- Increased default timeout value for DataNode networking test from 30 to 60
- Sensitive property list for Extract Logs can be customized (besides built-in anonymization)
Bug fixes
- BUGFIX: SparkRM and Single Process Pushdown no longer fails when input data set in PARQUET format contains complex data types (array, struct, map, nested), or in TEXTFILE format contains array, stuct or map data types
- BUGFIX: SparkRM no longer throws StackOverflow error if bootstrapping is used and number of bootstrap is larger than 250
- BUGFIX: Advanced Hive Parameters are no longer applied multiple times, thus leading to better performance
- BUGFIX: Automatic temporary data cleaning service is no longer started multiple times concurrently
- BUGFIX: During Studio or Server shutdown, temporary data cleaning threads are no longer cancelled prematurely
- BUGFIX: SparkConf description in Spark Script templates are fixed for both Python and R
- BUGFIX: With enabled Hive on Spark container reuse, the container pool size can no longer decrease to 0 because of the resource settings, it is always at least 1