Third-Party Keys
In order to facilitate easy and secure access to other cloud-provider data stores or services, you can configure access credentials here that you can later attach to jobs or use for loading datasets. Although configuring a third-party key is not mandatory, it is most streamlined way of ensuring your job environment has the access to the services it needs.
Once you configure a third-party key, you will no longer be able to retrieve the key's secret. If a configured key's secret is blank when revisiting the page, this does not mean the secret is lost. However, if you edit the key and click save without re-entering the key secret, it will be erased.
Third-party keys are configured on a per project basis so that different projects can be assigned granular access to their own third party resources. The account settings page allows you to configure the third-party keys for your Personal project. You can configure they keys for other projects you own from the projects page.
AWS
Never provide proxiML (or anyone for that matter) credentials for an IAM user with admin privileges.
Create a new IAM user for the proxiML platform to use. It is not recommended to reuse an existing IAM user. Follow the AWS documentation for this step. While creating the user, ensure you create a very specific policy that allows access only to the specific data or services needed to complete the model training process. As an example, the following policy allows the user to download a dataset a specific path of one bucket, and allows the ability to upload the final output to a specific path in another bucket.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt0",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::<name of bucket with data>",
"arn:aws:s3:::<name of bucket with data>/path/to/data/*"
]
},
{
"Sid": "Stmt1",
"Effect": "Allow",
"Action": ["s3:PutObject"],
"Resource": [
"arn:aws:s3:::<name of bucket for model outputs>/path/to/outputs/*"
]
}
]
}
Once the user is created, go back to the proxiML third-party key configuration page, and select AWS
from the Add
menu under Third-Party Keys. Copy the Access Key ID and the Secret Key from the AWS console to the relevant fields and click the check button.
Azure
Never provide proxiML (or anyone for that matter) credentials for user with admin privileges.
Create a new service principal for proxiML in the Azure account that contains the data or services you want the proxiML platform to interact with. In the app registration, select Single Tenant
as the Supported Account Type and leave the Redirect URI unconfigured.
On the App Registrations overview page for the newly created app, locate and note the Application (client) ID
and Directory (tenant) ID
fields for later. To finish generating the credentials, create an application secret for the application by following the instructions here.
Once the service principal credentials have been obtained, grant access to the principal by attaching one or more roles to it in the Access Control (IAM)
section of the Azure Subscription. Instructions can be found here. To allow proxiML access to Azure Blob Storage, you need to assign the Storage Blob Data Contributor role. To utilize the Azure Container Registry for private job images, you need to assign the AcrPull role.
proxiML recommmends that you add custom conditions to the role assignments to further restrict access to data within your account. For example to restrict access to a specific storage account container and write access to a specific path within that container, add the following condition to the role assignment:
(
ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'}
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEqualsIgnoreCase '<name of storage account container with data>'
)
OR
(
ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write'}
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs:path] StringStartsWith 'path/to/output/data'
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEqualsIgnoreCase '<name of storage account container with data>'
)
Once the service principal configured, go back to the proxiML third-party key configuration page, and select Azure
from the Add
menu under Third-Party Keys. Copy the Client ID, Tenant ID, and Secret from the previous steps to the relevant fields and click the check button.
Docker
Docker keys are used when pulling custom docker environments from private DockerHub repositories.
Do not use your regular DockerHub account credentials. Instead, generate an Access Token for proxiML and provide that so it can be easily scoped and revoked.
Create a new Docker Access Token from the account that has access to the private repository you wish to use by following the instructions here. If you have a Pro or Team plan, proxiML recommends that you create the token with Read Only
access permissions.
Once the access token is created, go back to the proxiML third-party key configuration page, and select Docker
from the Add
menu under Third-Party Keys. Enter your DockerHub username as the Key ID
and the generated access token as the Key Secret
and click the check button.
GCP
Never provide proxiML (or anyone for that matter) credentials for an Google service account with admin privileges.
Create a new service account in the GCP project that contains the data or services you want the proxiML platform to interact with. When creating the account, ensure you configure permissions very narrowly and allow access only to the specific data or services needed for the model training process. For example, if you wanted to download the data from the /data
path of the input-data-bucket
bucket. You should assign the Storage Object Viewer
role with a condition of type Name
and operator of Starts With
and a value of projects/_/buckets/input-data-bucket/objects/data/
. If you want to upload data to the /results
path of the artifacts-bucket
, You should assign the Storage Object Viewer
role with a condition of type Name
and operator of Starts With
and a value of projects/_/buckets/artifacts-bucket
, as well as assign the Storage Object Creator
role with a condition of type Name
and operator of Starts With
and a value of projects/_/buckets/artifacts-bucket/objects/results/
. The reason full read access is required for the output bucket is because gsutil requires bucket level read access in order to copy objects. For more details about condition and resource names on buckets and objects, review the GCP documentation.
Once the service account is created, create and download the service account key JSON file. Go to the proxiML third-party key configuration page and select GCP
from the Add
menu under Third-Party Keys. Click the Upload Json File
button, select the file you JSON file you downloaded, and click the check button.
In order to use the GCP keys to access services from a worker, you must first activate them in the job environment by including the following command in your script prior to accessing GCP services:
gcloud auth activate-service-account --key-file ${GOOGLE_APPLICATION_CREDENTIALS}
Alternatively, if you're using the Python SDK directly, you can activate the service account credentials using the from_service_account_json function and specify the location of the key file using the environment variable GOOGLE_APPLICATION_CREDENTIALS
.
Git
If you want to run jobs using private git repositories, you must create an SSH key for the proxiML platform to use when connecting to your repository. Click the Generate
button to create a new key. Once the key is created, copy the entire public key starting from ssh-ed25519
and up to and including [email protected]
and attach this key user in your private repository. For example, the instructions for adding a SSH key to your Github account can be found here.
Hugging Face
To integrate with Hugging Face, first create a User Access Token with these instructions. If you plan to only download data, create a read
token. If you plan to upload results back to huggingface, create a write
token. Once you have the token, go back to the proxiML third-party key configuration page, and select Hugging Face
from the Add
menu under Third-Party Keys. Enter the your Hugging Face account name as the Key ID
and the generated token as the Key Secret
and click the check button.
Kaggle
To enable Kaggle integration in the proxiML platform, you must first generate a Kaggle API token. Instructions to generate a new token can be found here. If you are already using the Kaggle CLI tool on your local computer, the API token is usually located at $HOME/.kaggle/kaggle.json
.
Once you have the kaggle.json
file for your account, Go to the proxiML third-party key configuration page and select Kaggle
from the Add
menu under Third-Party Keys. Click the Upload Json File
button, select the file you JSON file you downloaded, and click the check button. If the file is successfully uploaded, you should see Credentials File: kaggle.json
next to the trophy icon.
NVIDIA NGC
NVIDIA NGC keys are used for pulling NVIDIA maintained or private registry images from the NVIDIA NGC.
Create an API key for your NGC account that has access to the images you wish to use. Once you have the API key, go back to the proxiML third-party key configuration page, and select NVIDIA NGC
from the Add
menu under Third-Party Keys. Enter the API key in the NGC API Key
field and click the check button.
Wasabi
A recommended but not required first step is to create a policy that will restrict access for the proxiML integration to just the buckets and bucket paths required. Create a new policy using the Wasabi Documention. An example policy the restricts both reading and writing to a specific bucket path is the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<name of bucket with data>"
},
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::<name of bucket with data>/path/to/data/*"
},
{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<name of bucket with data>/path/to/outputs/*"
}
]
}
Once you have built the policy for the integration, create a new user for the proxiML platform to use. Follow the Wasabi documentation for this step. Create the user with Programmatic
access only, and select the policy you created on step 3. Once the user is created, it will prompt you to download the access keys.
Go to the proxiML third-party key configuration page and select Wasabi
from the Add
menu under Third-Party Keys. Input the Access Key ID and the Secret Key you just downloaded to the relevant fields and click the check button.