File input and output
Previous: Request results
Besides data tables, processes can handle different types of files as input and output.
The same is true for endpoints. They can be configured to have more than just JSON formatted data as input or output.
Default behavior
To better understand advanced input and output formats, let's briefly recap the default behavior.
Usually, you provide a JSON formatted request body and receive a JSON formatted output, e.g., via curl
of when
testing an Endpoint.
curl http://localhost:8099/DEFAULT/api/v1/services/mydeployment/score \
-X POST \
-H 'Content-Type: application/json' \
--data '{"data": [{}]}'
The underlying process knows how to handle the JSON input, because the Content-Type
explicitly states that the body is
in that format. In addition, omitting the Content-Type
header yields the same result as JSON is the default input
format. The request body payload is automatically parsed and converted to what the process's input requires (an
in-memory table representation leveraging the HDF5 data format) when JSON is the input format. With this, the process is
executed as usual, similar to when you execute in AI Studio directly. The key difference is that the input is coming
from the global process input port (from the outside, not via a Retrieve operator or alike).
The same applies vice versa to global output ports. When connected, the process's result (first one only), is returned and the endpoint automatically converts the in-memory table representation to JSON. This is what you then receive when you've called the endpoint.
Getting started with arbitrary input and output formats
For arbitrary input formats to work, the process being executed needs to receive the proper input. In addition, the underlying Scoring Agent or Web API Agent needs to convert any output format a process returns to a representation adhering to web standards.
Arbitrary input and output formats leverage the fil
(file) input and output format which means that non-JSON data is
transposed as binary file object into the process.
The process itself is responsible for handling the received file object!
To control input and output formats of endpoints, use the HTTP headers Content-Type
(for input) and Accept
(for
output). Behavior adheres the following rules:
- Requests with no input data can also be sent with the
GET
HTTP method, all others require thePOST
HTTP method.GET
request can leverage theAccept
header to specify the desired output format. - By default,
application/json
is used for input and output, meaning that anyPOST
request requires body payload in that format. If not explicitly provided as HTTP header, JSON is also the default input and output format. - When
Content-Type
is set to something else thanapplication/json
(the most generic one beingapplication/octet-stream
), the input request payload is treated as a binary file and properly transposed to the process to further handle it. - The
Accept
can be omitted. If omitted, the type of the process's result determines what is returned by the endpoint:- If it's a file object and no explicit
Accept
header has been provided, the response's body is set toapplication/octet-stream
. - If it's a table object, the response's body is set to
application/json
and the result is converted to JSON format.
- If it's a file object and no explicit
- If
Accept
is explicitly set, but the process's result does not match, a warning is printed in the Scoring Agent/Web API Agent and the response's body is set toapplication/octet-stream
.
The following examples show how to use arbitrary data formats as input and receive a specified output format for endpoints.
All of them use curl
command-line utility to demonstrate the API calls, are deployed as Web API Endpoints (URL
includes the Web API Group), and assume that no authentication is needed.
Example: reading CSV and returning as Excel (binary file)
Here's an example of a process that takes a CSV file as input and returns an Excel file as output.
The Read CSV operator is directly connected to the inp
of the process. When the endpoint is called,
data flows from the input port (the request's file body) towards the operator being connected. Though, designing a
process in such a way is not enough for it to work. When the process gets deployed, calling the
deployed endpoint with the proper Content-Type
header is necessary.
In addition to connecting the Read CSV operator, the following image shows that also a Write Excel operator needs to be added. Furthermore, the connection to the process's result port needs to be made.
When a CSV is provided as file input in the request body, the endpoint transposes the file object to the process. Read CSV parsed the file object, forwards it to Write Excel which determines the process's result format (file).
The following example shows how to call the endpoint with a CSV file as input and receive an Excel file as output.
curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/csv-in-xlsx-out \
-X POST \
-H 'Content-Type: application/octet-stream' \
--data-binary @/path/to/input.csv \
--output /path/to/output.xlsx
Example: reading CSV and returning JSON
When changing the example from above such that the Read CSV operator is directly connected to the process's result port, then JSON is returned.
Remember: the process's result determines the output format!
To call, the location of the output file can be omitted. The result is returned as JSON formatted data.
curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/csv-in-xlsx-out \
-X POST \
-H 'Content-Type: application/octet-stream' \
--data-binary @/path/to/input.csv
This call yields the following result (abbreviated):
{
"data": [
{
"att1": -3.9360351555111546,
"att2": 5.356000658667869,
"att3": -5.388412850201785,
"att4": 7.3295106089687385,
"att5": -1.6675003630680458,
"att6": "Todd Miller",
"att7": "Aachen",
"label": "cluster29"
}
]
}
Example: return an image
To return an image, the process's result needs to be a file object. The following example shows a process which does
that. It also needs no input, so it can be called with the GET
HTTP method.
The process randomly fetches a file and returns it.
If your client (not terminal), supports visualizing images, you directly see the image if you call the following:
curl https://$DOMAIN/DEFAULT/api/v1/services/any-io/no-in-img-out \
-H 'Accept: image/jpeg' \
--output /path/to/myimage.jpg
The result is a JPEG image. The Accept
header specifies the desired output format.
Limits
A 2GB limit applies to all binary input data per request. File input data is temporarily swapped to disk to transpose it to the process. It's cleaned after execution. If you expect a lot of concurrent requests to your endpoints, ensure to assign enough disk space to the Scoring Agent / Web API Agent to do the swapping.
In this section, you learned how to use arbitrary input and output formats with endpoints. The process itself is
responsible for handling the received file. The HTTP header Content-Type
helps you to define if an input is
transposed as file or as table object (JSON) to the process. Leaving it to application/octet-stream
is likely enough.
In contrast to that, changing the Accept
determines how a potential client interprets the process's file result
object. The type of the file object which the process result returns should match the Accept
header.