Running an Inference Job
Inference jobs are designed to run trained models on new data as part of a model inference pipeline and deliver the predictions back to an external location.
Starting a Job
Click the Run an Inference Job
link on the Home screen or the Create
button from the Inference Jobs Dashboard to open a new job form. Enter a job name that will uniquely identify this job for you. Select the type of GPU you would like to use by clicking an available GPU card. Select how many GPUs you want attached to each worker in the GPU Count
field. A maximum of 4 GPUs per inference job is allowed. If any options in these fields are disabled, there are not enough GPUs of that type available to satisfy your request. Specify the amount of disk space you want allocated for this job's working directory in the Disk Size
field. Be sure to allocate enough space to complete this job, as this allocation cannot be changed once the job is created.
Data
Inference jobs will automatically download the source data and upload the predictions as part of their execution. To specify the location of the source data, select the required storage provider from the Input Type
field and the path to the data in the Input Storage Path
field. When the job starts, the data from this location will be placed into the /opt/ml/input
directory. To specify where to send the output predictions, select the required storage provider from the Output Type
field and the path to send the results to in the Output Storage Path
field.
In order for the automatic output to work, you must save the resuls in the /opt/ml/output
folder. The recommended way to configure this in your code is to use the ML_OUTPUT_PATH
environment variable.
Model
If you created a proxiML model in the previous step, select proxiML
as the Model Type
and select it from the list. Otherwise, select Git
and specify the git clone URL of the repository.
Workers
Specify the command to use to perform the inference operation with the selected model on the data that will be loaded. The command specified will be run at the root of the model that was loaded based on the models section. For example, if your inference code is called predict.py
and takes parameters for the data location (--data-path) and where to save the predictions (--output-path), the command would be the following:
python predict.py --data-path=$ML_DATA_PATH --output-path=$ML_OUTPUT_PATH
This command takes advantage of the proxiML environment variables to ensure the code is utilizing the correct directory structure.
Review
Once you click Next on the job form, you are given the opportunity to review your inference job configuration for errors. Review these settings carefully. They cannot be changed once a job is started.
If the number of GPUs requested exceeds the current available GPUs of that type, you will receive a message on the review form stating that job will queue until GPUs become available. When this occurs, the job will wait until GPUs of the type you selected become available. You are not billed for waiting jobs.
Monitoring the Job
Once a job successfully starts, the dashboard should indicate that the job is in the running
state. Click the View
button to access the job logs. Log messages are sorted in descending order (most recent on top) and new log messages appear automatically as they are generated. If there are many log messages, you can scroll down on the page to see older logs.
To view detailed information about the job, click on the job name from the Inference Job dashboard.
Stopping and Terminating a Job
When each worker finishes executing, it automatically stops, and billing for that worker also stops. When all workers complete, the job is considered finished. You can also interrupt a running job or job worker by clicking Stop
on either the job or the job worker.
Finished jobs may be automatically purged after 24 hours.
When a job is finished, you can review the its details and download an extracts of the worker logs. When you no longer need to see the details of the job, click the Terminate
button.