CloudBender™ now allows you directly connect to regional datastores without having to copy that data into or out of proxiML persistent storage.
Motivation
proxiML Datasets make it really easy to load and reuse training data across proxiML 's scalable compute platform. However, sometimes data is either too large or too senstive to upload as a dataset. CloudBender customers can now directly mount local datasets to jobs they run in their own CloudBender regions by configuring Datastores. This allows customers to reuse their existing data storage infrastructure as well as ensure all data access occurs over a local lan connection, providing both additional convenience and security.
How It Works
Regional datastores require a Team or higher feature plan.
In order to configure and utilize regional datastores, you must first have a CloudBender region configured with at least one compute node. From that region's dashboard, click the Add
button on the Datastores toolbar. Fill out the datastore form with the information for the datastore in your region. Currently, only NFS and SMB/CIFS datastore types are supported.
Once the datastore has been added, you can select Regional Datastore
as the source type when creating datasets or as the input or output data location for jobs. Additionally, unlike other data output types, regional datastores can be attached to both Notebook and Endpoint jobs.
Regional datastores can be used to run inference tasks as new files get created in a specific directory. For an example of how to use proxiML Endpoints with regional datastores, see our example repository here.
Using the SDK
Once a datastore has been defined in one of your CloudBender regions. Jobs and datasets in projects you own can be configured to access them by specifying regional
as the data source type. The source uri field must be set to the datastore ID you wish to utilize. To obtain the datastore ID, use the web UI and unhide the "ID" column using the Columns
button on the datastore table. Source options must be set to a dictionary with the path
key set to the subdirectory within the datastore to utilize.
job = await proximl.jobs.create(
name="Regional Datastores Inference",
type="inference",
...
data=dict(
input_type="regional",
input_uri="5d29dcae-b1aa-4629-be67-548db5a141a1",
input_options=dict(path="/input"),
output_type="regional",
output_uri="09eec266-c9ce-47ab-ab55-d463f905e6f3",
output_options=dict(path="/output"),
),
)