File input and output

Besides data tables, processes can handle different types of files as input and output.

The same is true for endpoints. They can be configured to have more than just JSON formatted data as input or output.

Default behavior

To better understand advanced input and output formats, let's briefly recap the default behavior.

Usually, you provide a JSON formatted request body and receive a JSON formatted output, e.g., via curl of when testing an Endpoint.

curl http://localhost:8099/DEFAULT/api/v1/services/mydeployment/score \
    -X POST \
    -H 'Content-Type: application/json' \
    --data '{"data": [{}]}'

The underlying process knows how to handle the JSON input, because the Content-Type explicitly states that the body is in that format. In addition, omitting the Content-Type header yields the same result as JSON is the default input format. The request body payload is automatically parsed and converted to what the process's input requires (an in-memory table representation leveraging the HDF5 data format) when JSON is the input format. With this, the process is executed as usual, similar to when you execute in AI Studio directly. The key difference is that the input is coming from the global process input port (from the outside, not via a Retrieve operator or alike).

The same applies vice versa to global output ports. When connected, the process's result (first one only), is returned and the endpoint automatically converts the in-memory table representation to JSON. This is what you then receive when you've called the endpoint.

Getting started with arbitrary input and output formats

For arbitrary input formats to work, the process being executed needs to receive the proper input. In addition, the underlying Scoring Agent or Web API Agent needs to convert any output format a process returns to a representation adhering to web standards.

Arbitrary input and output formats leverage the fil (file) input and output format which means that non-JSON data is transposed as binary file object into the process.

The process itself is responsible for handling the received file object!

To control input and output formats of endpoints, use the HTTP headers Content-Type (for input) and Accept (for output). Behavior adheres the following rules:

Requests with no input data can also be sent with the GET HTTP method, all others require the POST HTTP method. GET request can leverage the Accept header to specify the desired output format.
By default, application/json is used for input and output, meaning that any POST request requires body payload in that format. If not explicitly provided as HTTP header, JSON is also the default input and output format.
When Content-Type is set to something else than application/json (the most generic one being application/octet-stream), the input request payload is treated as a binary file and properly transposed to the process to further handle it.
The Accept can be omitted. If omitted, the type of the process's result determines what is returned by the endpoint:
- If it's a file object and no explicit Accept header has been provided, the response's body is set to application/octet-stream.
- If it's a table object, the response's body is set to application/json and the result is converted to JSON format.
If Accept is explicitly set, but the process's result does not match, a warning is printed in the Scoring Agent/Web API Agent and the response's body is set to application/octet-stream.

The following examples show how to use arbitrary data formats as input and receive a specified output format for endpoints.

All of them use curl command-line utility to demonstrate the API calls, are deployed as Web API Endpoints (URL includes the Web API Group), and assume that no authentication is needed.

Example: reading CSV and returning as Excel (binary file)

Here's an example of a process that takes a CSV file as input and returns an Excel file as output.

img/csv_input.png

The Read CSV operator is directly connected to the inp of the process. When the endpoint is called, data flows from the input port (the request's file body) towards the operator being connected. Though, designing a process in such a way is not enough for it to work. When the process gets deployed, calling the deployed endpoint with the proper Content-Type header is necessary.

In addition to connecting the Read CSV operator, the following image shows that also a Write Excel operator needs to be added. Furthermore, the connection to the process's result port needs to be made.

When a CSV is provided as file input in the request body, the endpoint transposes the file object to the process. Read CSV parsed the file object, forwards it to Write Excel which determines the process's result format (file).

The following example shows how to call the endpoint with a CSV file as input and receive an Excel file as output.

curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/csv-in-xlsx-out \
    -X POST \
    -H 'Content-Type: application/octet-stream' \
    --data-binary @/path/to/input.csv \
    --output /path/to/output.xlsx

Example: reading CSV and returning JSON

When changing the example from above such that the Read CSV operator is directly connected to the process's result port, then JSON is returned.

Remember: the process's result determines the output format!

To call, the location of the output file can be omitted. The result is returned as JSON formatted data.

curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/csv-in-xlsx-out \
    -X POST \
    -H 'Content-Type: application/octet-stream' \
    --data-binary @/path/to/input.csv

This call yields the following result (abbreviated):

{
  "data": [
    {
      "att1": -3.9360351555111546,
      "att2": 5.356000658667869,
      "att3": -5.388412850201785,
      "att4": 7.3295106089687385,
      "att5": -1.6675003630680458,
      "att6": "Todd Miller",
      "att7": "Aachen",
      "label": "cluster29"
    }
  ]
}

Example: return an image

To return an image, the process's result needs to be a file object. The following example shows a process which does that. It also needs no input, so it can be called with the GET HTTP method.

The process randomly fetches a file and returns it.

If your client (not terminal), supports visualizing images, you directly see the image if you call the following:

curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/no-in-img-out \
  -H 'Accept: image/jpeg' \
  --output /path/to/myimage.jpg

The result is a JPEG image. The Accept header specifies the desired output format.

Limits

A 2GB limit applies to all binary input data per request. File input data is temporarily swapped to disk to transpose it to the process. It's cleaned after execution. If you expect a lot of concurrent requests to your endpoints, ensure to assign enough disk space to the Scoring Agent / Web API Agent to do the swapping.

In this section, you learned how to use arbitrary input and output formats with endpoints. The process itself is responsible for handling the received file. The HTTP header Content-Type helps you to define if an input is transposed as file or as table object (JSON) to the process. Leaving it to application/octet-stream is likely enough. In contrast to that, changing the Accept determines how a potential client interprets the process's file result object. The type of the file object which the process result returns should match the Accept header.

Categories

Versions