The proxiML platform has been extended to support deploying models as REST API endpoints. These fully managed endpoints give you the real-time predictions you need for production applications without having to worry about servers, certificates, networking, or web development.
How It Works
proxiML Endpoints allow you to expose your model code as a REST API endpoint. The endpoint runs a webserver on a publicly accessible URL that will call functions inside the model code you supply based on the configuration you specify. Once you have a trained model with inference code saved as a proxiML Model, in a git repository, or on your local computer, you can configure a proxiML Endpoint to run that inference code.
Endpoints have one or more "routes", which are defined by the URL path and HTTP verb of the request that the endpoint should accept. For example, a POST
request to path /predict
is a separate route from a GET
request to the same path, and can therefore call different functions with different parameters. For each route, you specify the Python file that contains the code to be called as well as the method within that file to call.
The input data to the prediction is provided in the HTTP request body (only POST
requests are currently supported). Each route allows you to configure the expected Request Body Template. This body template both validates that the request body is properly formatted as well as maps the request body attributes to the arguments the specified function expects. If the function uses positional arguments, simply define the request body with the attributes in the same order as the function expects. If the function uses keyword arguments, define the request body to have attributes with the same name as the function arguments.
Once the endpoint is created and running, it will return the hostname to reach the endpoint. Use any HTTP client (e.g. curl, Postman, etc.) to query the configured route and get the response. All endpoints also have the GET
method defined for the /ping
URL that can be used for health checking.
If the endpoint begins to be overloaded, it will return HTTP 503 errors. If you are calling the endpoint programmatically, be sure to look for this error response and retry the request using a backoff algorithm
Endpoints will run until stopped and are billed the same way as other job types.
Using the Web Platform
Navigate to the Endpoint Dashboard and click the Create
button. Specify the name and required resources for the endpoint. In the Models
section, specify the proxiML Model, git repository, or local computer directory that contains the trained model and inference code.
In the Endpoint
section, click Add Route
. Keep POST
as the HTTP Verb
and specify /predict
as the Path
. Specify the file within the model code to use as the File Name
(e.g. predict.py
) and the method to call as the Function Name
(e.g. predict_file
). Click Add Parameter
to configure the information that will be provided in the HTTP request body when the endpoint is called. These parameters must exactly match the arguments the specified function is expecting. If the function is expecting positional arguments, ensure the Function Uses Positional Arguments
box is checked.
Specify the expected Data Type
for each parameter. String, Integer, Float, Boolean, Object, and List data types are supported. If the parameter is optional, check the Optional
box and specify a Default Value
. The default value must be valid Python (e.g. None
should be used rather than null
or undefined
). See FastAPI's request body documentation for more details.
If a request's body does not match the template, the request will be rejected.
Click Next
and Create
to start the endpoint. Once the endpoint successfully starts, the dashboard should indicate that the endpoint is in the running
state. Click the Connect
button to view public URL of the endpoint as well as an example command to query one of it's routes.
You can also access the endpoint's logs by clicking the View
button. Log messages are sorted in descending order (most recent on top) and new log messages appear automatically as they are generated. If there are many log message, you can scroll down on the page to see older logs.
To view detailed information about the endpoint, click on the endpoint name from the Endpoint dashboard.
Using the SDK
You can also create an endpoint using the proxiML SDK. Specify endpoint
as the type and provide an endpoint
dictionary to the job create command. The endpoint
dictionary requires a routes
key that is a list of route. Each route is a dictionary with path
, verb
, function
, file
, positional
, and body
keys. The body
key is a list of the parameters for the request body, with keys name
, type
, optional
, and default_value
. An example specification is the following:
job = await proximl.jobs.create(
name="Endpoint",
type="endpoint",
...
endpoint=dict(
routes=[
dict(
path="/predict",
verb="POST",
function="predict_image",
file="predict",
positional=True,
body=[dict(name="filename", type="str")],
)
]
),
)
The type
field for the parameters must be one of str
, int
, float
, bool
, dict
, or list
. Once the job is in the running
state, you can retrieve the endpoint URL from the url
property of the job (e.g. job.url
).
Using the CLI
The proxiML CLI can also create endpoints. Endpoint routes can be specified using the --route
option. The data for each route must be properly formatted JSON with the same keys and values as required in the SDK above. An example is the following:
proximl job create endpoint \
--git-uri https://github.com/proxiML/simple-tensorflow-classifier.git \
--route '{"verb": "POST", "path": "/predict", "file": "predict", "positional": true, "body": [{"name": "filename", "type": "str"}], "function": "predict_image"}' \
"New Inference Endpoint"
This command will print the endpoint URL to the terminal once the endpoint is running.