Deploying an Inference Endpoint
Endpoints are used to deploy a trained model as a REST API. Endpoints are useful when predictions are needed in real-time and the input data is small enough to be sent through an HTTP request.
Creating an Endpoint
Click the Deploy an Inference Endpoint
link on the Home screen or the Create
button from the Endpoints Dashboard to open a new job form. Enter a name that will uniquely identify this endpoint for you. Select the type of GPU you would like to use by clicking an available GPU card. Select how many GPUs you want attached to the endpoint in the GPU Count
field. A maximum of 4 GPUs per endpoint is allowed. If any options in these fields are disabled, there are not enough GPUs of that type available to satisfy your request. Specify the amount of disk space you want allocated for this endpoint's working directory in the Disk Size
field. Be sure to allocate enough space to run the endpoint, as this allocation cannot be changed once it is created.
Model
If you created a proxiML model in the previous step, select proxiML
as the Model Type
and select it from the list. Otherwise, select Git
and specify the git clone URL of the repository.
Routes
Routes allow you to direct HTTP requests to your model code. Click the Add Route
button to add a new route. Keep POST
as the HTTP Verb
and specify /predict
as the Path
. Specify the file within the model code to use as the File Name
(e.g. predict.py
) and the method to call as the Function Name
(e.g. predict_file
).
Click Add Parameter
to configure the information that will be provided in the HTTP request body when the endpoint is called. These parameters must exactly match the arguments the specified function is expecting. If the function is expecting positional arguments, ensure the Function Uses Positional Arguments
box is checked. Ensure that the parameters are ordered within the template based on the order they should be supplied to the function. If the function expects keyword arguments, the parameter name of the request body must match the argument name of the function.
Specify the expected Data Type
for each parameter. If a request's body does not match the template, the request will be rejected.
Review
Once you click Next on the job form, you are given the opportunity to review your endpoint configuration for errors. Review these settings carefully. They cannot be changed once a job is started.
If the number of GPUs requested exceeds the current available GPUs of that type, you will receive a message on the review form stating that job will queue until GPUs become available. When this occurs, the job will wait until GPUs of the type you selected become available. You are not billed for waiting jobs.
Using the Endpoint
Once the endpoint successfully starts, the dashboard should indicate that the endpoint is in the running
state. Click the Connect
button to view public URL of the endpoint as well as an example command to query one of it's routes.
You can also access the endpoint's logs by clicking the View
button. Log messages are sorted in descending order (most recent on top) and new log messages appear automatically as they are generated. If there are many log messages, you can scroll down on the page to see older logs.
To view detailed information about the endpoint, click on the endpoint name from the Endpoint dashboard.
If the endpoint begins to be overloaded, it will return HTTP 503 errors. If you are calling the endpoint programmatically, be sure to look for this error response and retry the request using a backoff algorithm
All endpoints are configured to return an HTTP 200 response to GET
requests on the route /ping
. This route can be used for endpoint healthchecks.
Stopping, Restarting, and Terminating an Endpoint
When you are done actively using the endpoint, you can stop it by clicking Stop
from the proxiML dashboard. This will stop billing.
If you want to restart the endpoint, click the Restart
button. Endpoints will always start from the same source environment and model code when they were originally created. Any temporary files or changes to the environment that may have occurred during a previous endpoint run will not be preserved. This is to ensure environment integrity
Stopped endpoints may be automatically purged after two weeks of non-use.
If you are finished with an endpoint, click the Terminate
button. This will purge the endpoint's environment and all its data. If an endpoint fails, it cannot be restarted, only terminated.