Skip to main content

Build Full Machine Learning Pipelines with Inference Jobs

· 2 min read

The proxiML platform has been extended to support batch inference jobs, enabling customers to use proxiML for all stages of the machine learning pipeline that require GPU acceleration.

How It Works

A new section has been added to the sidebar navigation that provides access to the Inference Jobs Dashboard. To create a new inference job, navigate to the Inference Jobs Dashboard and click Create. The new job form is similar to existing jobs types with two differences:

  • Inference jobs can not be used on datasets. Instead, you can download external data as part of the job creation process using the fields in the Input Data section. Data downloaded in this manner is purged from the system when the job finishes.
  • Inference jobs only support a single worker.

Example Pipeline

The following is an example of using the proxiML SDK to walk through the full machine learning pipeline using datasets, models, training jobs, and inference jobs.

  1. Initialize the SDK.
from proximl import ProxiML
import asyncio

proximl = ProxiML()
  1. Create the training dataset.
dataset = asyncio.run(
proximl.datasets.create(
name="Example Dataset",
source_type="aws",
source_uri="s3://example-bucket/data/cifar10",
)
)

asyncio.run(dataset.attach())
  1. Create the training job to run on the newly created datasets and export it's results to a proxiML model
training_job = asyncio.run(
proximl.jobs.create(
name="Example Training Job",
type="training",
gpu_type="gtx1060",
gpu_count=1,
disk_size=10,
workers=[
"PYTHONPATH=$PYTHONPATH:$ML_MODEL_PATH python -m official.vision.image_classification.resnet_cifar_main --num_gpus=1 --data_dir=$ML_DATA_PATH --model_dir=$ML_OUTPUT_PATH --enable_checkpoint_and_export=True --train_epochs=10 --batch_size=1024",
],
data=dict(
datasets=[dict(id=dataset.id, type="existing")],
output_type="proximl",
),
model=dict(git_uri="[email protected]:my-account/test-private.git"),
)
)
asyncio.run(training_job.attach())
  1. Get the model's ID from the training job worker and wait for the model creation to finish.
training_job = asyncio.run(training_job.refresh())

model = asyncio.run(
proximl.models.get(training_job.workers[0].get("output_model_uuid"))
)

asyncio.run(model.wait_for("ready"))
  1. Run an inference job using the created model on new data and save the results to an external source.
inference_job = asyncio.run(
proximl.jobs.create(
name="Example Inference Job",
type="inference",
gpu_type="GTX 1060",
gpu_count=1,
disk_size=10,
workers=[
"PYTHONPATH=$PYTHONPATH:$ML_MODEL_PATH python -m official.vision.image_classification.resnet_cifar_main --num_gpus=1 --data_dir=$MLDATA_PATH --model_dir=$ML_OUTPUT_PATH --enable_checkpoint_and_export=True --train_epochs=10 --batch_size=1024",
],
data=dict(
input_type="aws",
input_uri="s3://example-bucket/data/new_data",
output_type="aws",
output_uri="s3://example-bucket/output/model_predictions",
),
model=dict(model_uuid=model.id),
)
)
asyncio.run(inference_job.attach())
  1. (Optional) Cleanup resources.
asyncio.gather(
training_job.remove(),
inference_job.remove(),
model.remove(),
dataset.remove(),
)