Checkpoints and Datasets are now supported output destinations for proxiML Training and Inference jobs.
How It Works
Previously, the proxiML output type did not accept the output_uri
property. This property can now be specified as model
(the default if not provided), dataset
, or checkpoint
. Additionally, there is a new output_options
field called save_model
. This field is set to True
by default when using the model
output type and False
when using dataset
or checkpoint
. Currently, this field can only be changed when using the proxiML SDK.
When save_model
is set to True
, the ML_OUTPUT_PATH
environment variable is set to /opt/ml/models
instead of /opt/ml/output
and the contents of /opt/ml/models
is uploaded to the output destination. If set to false, ML_OUTPUT_PATH
remains set to /opt/ml/output
and only that directory is uploaded to the output destination.
Using the Web Platform
To create a dataset or checkpoint from a job, select proxiML
as the Output Type
and the desired entity type data section of the job form. Once each worker in the job finishes, it will save the entire directory structure of /opt/ml/output
to a new dataset or checkpoint with the name Job - <job name>
if there is one worker or Job - <job name> Worker <worker number>
if there are multiple workers.
Using the SDK
To save a training job's output to a checkpoint instead of a model, use the following syntax:
job = await proximl.jobs.create(
"Training Checkpoint Output",
type="training",
...
data=dict(
...
output_type="proximl",
output_uri="checkpoint",
output_options=dict(save_model=False),
),
...
)