Analytics providers can now run their models directly on the proxiML deployments of their customers. This allows the analytics provider to maintain and protect their intellectual property while providing analytics services inside their customers' secure, private infrastructure.
Motivation
Analytics providers that build models to provide insights on sensitive, confidential, or regulated data have additional challenges in getting their models commercialized. For other types of inference, a company can simply expose a web service for their customers to call. With more sensitive data (like protected health information), many customers will be unwilling to transmit that data outside their physical site or cloud account. For international customers, many counties have regulations that prohibit certain types of data from leaving their borders.
If the data can't come to the model, the only other option is to send the model to the data. But how? Few startups (or enterprises, for that matter) have the skills, staff, or tools to support and manage numerous on-premise software installations. How do they release model updates? How do they make sure customers download and install the updates? How do they know customers are in compliance with their licensing model? Can they prevent customers from reverse engineering their model? Does the customer even have a GPU-enabled system to run inference on?
With Federated Inference, proxiML customers can seamlessly and securely integrate their inference services with other customers of the proxiML platform. By installing a CloudBender-managed proxiML server inside their customers' site or cloud, the customer can grant a inference service provider Federated Inference access their proxiML resources. This will allow the analytics provider to execute Inference Jobs, using their models on data stores local to the customer's site, without that data ever leaving the site.
The analytics provider is able to maintain their own web service to initiate and track inference jobs, ensuring they are in complete control of model access, billing, and the user experience. The customer is in complete control of their data, and can firewall the inference job from communicating externally while the data is attached. The analytics provider never has access to the data, the customer never has access to the model, and the inference results still get delivered.
How It Works
Federated Inference jobs require an Enterprise feature plan. Please contact sales to enable this functionality.
Federated Inference requires two separate parties, each with their own proxiML account. One party is the analytics provider, who owns the model code/weights. The other party is the analytics consumer, who needs the inference results from the provider's model. In order to enable a Federated Inference relationship between the two parties, the analytics consumer must first create a CloudBender™ deployment in their account and on-board at least one compute node (physical or cloud).
Next, the analytics consumer must create a new proxiML Project and add the analytics provider as a project member. To ensure that the jobs only run in the analytics consumer's region, the project should be configured with the setting Use Only Owned Compute Resources
enabled.
Once the analytics provider has access to the consumer's project, they can initiate a job from their proxiML account on the analytic's consumer's project. Federated Inference workloads are currently only supported using the proxiML SDK.
job = await proximl.jobs.create(
name="Federated Inference Job",
type="inference",
project_uuid="<consumer shared project id>",
...
model=dict(
source_type="proximl",
source_uri="<provider model id>",
project_uuid="<provider model project id>",
),
...
)
The root level project_uuid
determines which project the job runs in and should be set to the analytics consumer's shared project ID. Normally, jobs only have access to models that exist in the same project as the job, but federated inference jobs allow the analytics provider to create a job using a model in a project that they own. The project_uuid
field inside the model
dictionary should be set to the analytics provider's project that contains the model to use.
When this job runs, it will execute inside the analytics consumer's project. Since the project is configured to only use their resources, it will run inside the consumer's local CloudBender infrastructure. Since the analytics consumer does not have access to the analytics provider's model project, only the analytics provider can initiate a job in this manner, so they can easily track usage and bill the consumer appropriately.
Be sure that the specified GPU Types in the job specification are types that are available in the consumer's CloudBender region, otherwise the job will stay in the waiting for GPUs
status until cancelled.
Federated Inference jobs incur an additional 1.85 credits/hr per worker fee that is charged to the creator of the job (not the project owner). This fee does not vary by GPU type or number of GPUs per worker.