Serving

Serving Ludwig Models¶

Ludwig models can be served using the serve command.

ludwig serve --model_path=/path/to/model

The command will spawn a Rest API using the FastAPI library.

This API has two endpoints: predict and predict_batch. predict should be used to obtain predictions for individual examples, while predict_batch should be used to obtain predictions for an a batch of examples.

Inputs sent to the REST API should be consistent with the feature names and types used to train the model. The output structure from the REST API depends on the model's output features and their data types.

REST Endpoints¶

predict¶

Input format¶

For each input of the model, the predict endpoint expects a field with a name. For instance, a model trained with an input text field named english_text would expect a POST like:

curl http://0.0.0.0:8000/predict -X POST -F 'english_text=words to be translated'

If the model was trained with an input image field, it will instead expects a POST with a file, like:

curl http://0.0.0.0:8000/predict -X POST -F 'image=@path_to_image/example.png'

A model with both a text and an image field will expect a POST like:

curl http://0.0.0.0:8000/predict -X POST -F 'text=mixed together with' -F 'image=@path_to_image/example.png'

Output format¶

The response is a JSON dictionary with keys prefixed by the names of the model's output features.

For binary outputs, the JSON structure returned by the REST API is the following:

{
   "NAME_predictions": false,
   "NAME_probabilities_False": 0.76,
   "NAME_probabilities_True": 0.24,
   "NAME_probability": 0.76
}

For number outputs, the JSON structure returned by the REST API is the following:

{"NAME_predictions": 0.381}

For categorical outputs, the JSON structure returned by the REST API is the following:

{
   "NAME_predictions": "CLASSNAMEK",
   "NAME_probability": 0.62,
   "NAME_probabilities_CLASSNAME1": 0.099,
   "NAME_probabilities_CLASSNAME2": 0.095,
   ...
   "NAME_probabilities_CLASSNAMEN": 0.077
}

For set outputs, the JSON structure returned by the REST API is the following:

{
   "NAME_predictions":[
      "CLASSNAMEI",
      "CLASSNAMEJ",
      "CLASSNAMEK"
   ],
   "NAME_probabilities_CLASSNAME1":0.490,
   "NAME_probabilities_CLASSNAME2":0.245,
   ...
   "NAME_probabilities_CLASSNAMEN":0.341,
   "NAME_probability":[
      0.53,
      0.62,
      0.95
   ]
}

For sequence outputs, the JSON structure returned by the REST API is the following:

{
   "NAME_predictions":[
      "TOKEN1",
      "TOKEN2",
      "TOKEN3"
   ],
   "NAME_last_predictions": "TOKEN3",
   "NAME_probabilities":[
      0.106,
      0.122,
      0.118,
      0.133
   ],
   "NAME_probability": -6.4765729904174805
}

For text outputs, the JSON structure returned by the REST API is the same as for sequences.

batch_predict¶

Input format¶

You can also make a POST request on the /batch_predict endpoint to run inference on multiple samples at once.

Requests must be submitted as form data, with one of fields being dataset: a JSON encoded string representation of the data to be predicted.

The dataset JSON string is expected to be in the Pandas split format to reduce payload size. This format divides the dataset into three parts:

columns: List[str]
index (optional): List[Union[str, int]]
data: List[List[object]]

Additional form fields can be used to provide file resources like images that are referenced within the dataset.

An example of batch prediction:

curl http://0.0.0.0:8000/batch_predict -X POST -F 'dataset={"columns": ["a", "b"], "data": [[1, 2], [3, 4]]}'