Execute Python (Python Scripting)
Synopsis
Executes a Python script, allowing for custom data transformations and manipulations within your Altair AI Studio workflows.Description
The Execute Python operator enables you to run custom Python scripts directly within your data workflows in Altair AI Studio. Before using this operator, you may need to specify the path to your Python installation under the Settings -> Preferences menu (on Mac OS, choose Altair AI Studio -> Preferences). In the settings panel, select the Python Scripting tab. Your Python installation must include the pandas module since example sets are converted to pandas.DataFrame objects. For the newer versions of the extension (11.0.0+) also include the pyarrow module in your Python environment.
By unchecking the use default python checkbox, you can configure an individual Python binary for this operator instead of using the global settings. The operator supports Conda/Mamba (Anaconda/Miniconda/Miniforge) virtual environments, Virtualenvwrapper environments, or specifying a Python binary by providing the full file system path. For more information on how to select the required Python environment, see the Parameters section of this help page. Note that you may need to configure the extension. For this, go to Settings -> Preferences menu (on Mac OS, choose Altair AI Studio -> Preferences). In the settings panel, select the Python Scripting tab. Edit the settings here if required.
This operator executes either the script provided through the script file port or parameter or the script specified in the script parameter. The arguments of the script correspond to the input ports, where example sets are converted to Pandas DataFrames. Analogously, the values returned by the script are delivered at the output ports of the operator, where DataFrames are converted back to example sets.
Using conda: If you installed the Conda Python distribution to a non-default location, you may need to add the installation directory and some subdirectories in the global settings of the Python Scripting Extension. For this go to Settings -> Preferences menu (on Mac OS choose Altair AI Studio -> Preferences). In the settings panel select the Python Scripting tab. Add the installation directory of your Conda installation to the list of search paths. On Windows you need to add the conda_install_dir\Scripts subdirectory and on Linux and Mac OS the conda_install_dir/bin subdirectory as well.
Accessing macros: You can access and modify the macros defined in Altair AI Studio from the Python code. You can call a macro by enclosing the name of the macro inside the %{} marks. Before interpreting the Python code, these values will be substituted with actual macro values. For more fine-grained control over macros, set the use macros parameter. For more information see the parameter description below.
The console output of Python is shown in the Log View (View -> Show View -> Log).
Handling Meta Data: If you pass an example set to your script through an input port, the meta data of the example set (types and roles) is available in the script. You can access it by reading the attribute rm_metadata of the associated Pandas DataFrame, for example data.rm_metadata.
data.rm_metadata is a dictionary where the keys are attribute names, and the values are dictionaries containing the attribute type and role. The attribute types and roles can only be chosen from a fixed set of predefined values.
For example, to access the type and role of each attribute:
for name in data.columns: attribute_meta = data.rm_metadata.get(name, {}) attribute_type = attribute_meta.get('rm_type', None) attribute_role = attribute_meta.get('rm_role', None) print(f"Attribute {name}: type={attribute_type}, role={attribute_role}")You can influence the meta data of an example set that you return as a Pandas DataFrame by setting the attribute rm_metadata. For instance, to set the attribute type and role for a column: data.rm_metadata['column_name'] = ("integer","prediction")
The possible attribute types are: 'integer', 'real', 'nominal', 'date-time', 'time', 'text', 'text-set', 'text-list', 'real-list'.
The possible attribute roles are: 'BATCH', 'CLUSTER', 'ENCODING', 'ID', 'INTERPRETATION', 'LABEL', 'METADATA', 'OUTLIER', 'PREDICTION', 'SCORE', 'SOURCE', 'WEIGHT',.
If you don't specify attribute types, they will be determined using the data types in the Rapidminer Table format (default and preferred way).
For more information about meta data handling in a Python operator, check the tutorial process 'Meta Data Handling' below.
If a script file is provided either through the script file port or parameter (port takes precedence), that script will be used instead of the value of the script parameter.
Input
- script file (File)
A file containing a Python script to be executed. The file must comply with the script parameter rules. This port is optional; a file can also be provided through the script file parameter.
- input
The Execute Python operator can have multiple inputs. An input must be either an example set, a file object, a connection object, or a Python object which was generated by an 'Execute Python' operator.
Output
- output
The Execute Python operator can have multiple outputs. An output can be either an example set, a file object or a Python object generated by this operator.
Parameters
- script
The Python script to execute. Define a method with name 'rm_main' with as many arguments as connected input ports or alternatively a *args argument to use a dynamic number of attributes. The return values of the method 'rm_main' are delivered to the connected output ports. If the method returns a tuple then the single entries of the tuple are delivered to the output ports. Entries of the data type 'pandas.DataFrames' are converted to example sets; files are converted to File Objects, other Python objects are serialized and can be used by other 'Execute Python' operators or stored in the repository. Serialized Python objects must to be smaller than 2 GB.
- script file A file containing a python script to be executed. The file has to comply with the script parameter rules. This parameter is optional.
- use default python
Use the Python binary or environment defined in the Altair AI Studio global settings. The global settings can be accessed from the Settings -> Preferences menu (on Mac OS choose Altair AI Studio -> Preferences). In the settings panel select the Python Scripting tab. Here you can define the defaults.
- package manager
This parameter only available if use default python is set to false. It specifies the package manager used by the operator. Currently Conda/Anaconda/Miniconda/Miniforge and Virtualenvwrapper is supported, or you can define the full path to your preferred python binary.
- conda environment
This parameter only available if use default python is set to false and package manager is set to conda (anaconda). It specifies the conda virtual environment used by this operator.
- venvw environment
This parameter only available if use default python is set to false and package manager is set to virtualenvwrapper. It specifies the virtualenvwrapper virtual environment used by this operator.
- python binary
This parameter only available if use default python is set to false and package manager is set to specific python binaries.It specifies the path to the python binary, used by this operator.
- use macros
Use an additional named parameter macros for the rm_main method (note that you will need to modify the script and add the parameter manually). This way all the macro values will be passed as an additional parameter of the rm_main method and you can access the macro values via the macros dictionary. Each dictionary value will be a Python string. You can also modify values of the dictionary or add new elements. The changes will be reflected in Altair AI Studio after the execution of the operator.
- compatibility level
To ensure backward compatibility, you can select a compatibility level that may differ in data types, functionality, or metadata usage. The possible levels are:
9.3: Basic version that uses CSV as an intermediary serialization type to communicate between Java and script code.
9.8: After this operator version, a bug was fixed regarding HDF5 date/time handling.
11.0.0: For operator versions 11.0.0 and above, Arrow-based serialization/deserialization is used for tabular data. Additionally, the underlying logic changed to use IOTable instead of the deprecated ExampleSet. This impacts data types and metadata handling.
Tutorial Processes
Clustering using Python
Random data is generated and then fed to the Python script . The script clusters the data in Python using as many clusters as are specified in the macro. The resulting ExampleSet contains the cluster in the 'cluster' attribute.
Building a model and applying it using Python
This tutorial process uses the 'Execute Python' operators to first build a decision tree model using the 'Deals' data and then applying it to the 'Deals Testset' data. Before using the data, it the nominal values are converted to unique integers. The first Python scripting operator 'build model' builds the model and delivers it to its output port. The second Python scripting operator 'apply model' applies this model to the testset, adding a column called prediction. After specifying the 'label' and 'prediction' columns with 'Set Role', the result can be viewed.
Creating a plot using Python and storing it in your repository
This tutorial process uses the 'Execute Python' operator to first fetch example data, then create a plot and return both to the output ports. Please store the process in your repository. The data are shown as example set and the plot is stored in the repository as image.
Reading an example set from a file using Python
This tutorial process uses the 'Execute Python' operator to save example data in a csv file. The second 'Execute Python' operator receives this file, reads the data and returns a part of the data to the output port. The result is an example set.
Meta data handling
This tutorial process shows how to access the meta data of incoming example sets inside a 'Execute Python' operator. It also explains how to set the meta data for the outcoming example sets.