Skip to main content

Third-Party Keys

In order to facilitate easy and secure access to other cloud-provider data stores or services, you can configure access credentials here that you can later attach to jobs or use for loading datasets. Although configuring a third-party key is not mandatory, it is most streamlined way of ensuring your job environment has the access to the services it needs.

caution

Once you configure a third-party key, you will no longer be able to retrieve the key's secret. If a configured key's secret is blank when revisiting the page, this does not mean the secret is lost. However, if you edit the key and click save without re-entering the key secret, it will be erased.

Third-party keys are configured on a per project basis so that different projects can be assigned granular access to their own third party resources. The account settings page allows you to configure the third-party keys for your Personal project. You can configure they keys for other projects you own from the projects page.

AWS

tip

Never provide trainML (or anyone for that matter) credentials for an IAM user with admin privileges.

Create a new IAM user for the trainML platform to use. It is not recommended to reuse an existing IAM user. Follow the AWS documentation for this step. While creating the user, ensure you create a very specific policy that allows access only to the specific data or services needed to complete the model training process. As an example, the following policy allows the user to download a dataset a specific path of one bucket, and allows the ability to upload the final output to a specific path in another bucket.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt0",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::<name of bucket with data>",
"arn:aws:s3:::<name of bucket with data>/path/to/data/*"
]
},
{
"Sid": "Stmt1",
"Effect": "Allow",
"Action": ["s3:PutObject"],
"Resource": [
"arn:aws:s3:::<name of bucket for model outputs>/path/to/outputs/*"
]
}
]
}

Once the user is created, go back to the trainML third-party key configuration page, and select AWS from the Add menu under Third-Party Keys. Copy the Access Key ID and the Secret Key from the AWS console to the relevant fields and click the check button.

Azure

tip

Never provide trainML (or anyone for that matter) credentials for user with admin privileges.

Create a new service principal for trainML in the Azure account that contains the data or services you want the trainML platform to interact with. In the app registration, select Single Tenant as the Supported Account Type and leave the Redirect URI unconfigured. On the App Registrations overview page for the newly created app, locate and note the Application (client) ID and Directory (tenant) ID fields for later. To finish generating the credentials, create an application secret for the application by following the instructions here.

Once the service principal credentials have been obtained, grant access to the principal by attaching one or more roles to it in the Access Control (IAM) section of the Azure Subscription. Instructions can be found here. To allow trainML access to Azure Blob Storage, you need to assign the Storage Blob Data Contributor role. To utilize the Azure Container Registry for private job images, you need to assign the AcrPull role.

trainML recommmends that you add custom conditions to the role assignments to further restrict access to data within your account. For example to restrict access to a specific storage account container and write access to a specific path within that container, add the following condition to the role assignment:

(
ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'}
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEqualsIgnoreCase '<name of storage account container with data>'
)
OR
(
ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write'}
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs:path] StringStartsWith 'path/to/output/data'
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEqualsIgnoreCase '<name of storage account container with data>'
)

Once the service principal configured, go back to the trainML third-party key configuration page, and select Azure from the Add menu under Third-Party Keys. Copy the Client ID, Tenant ID, and Secret from the previous steps to the relevant fields and click the check button.

Docker

Docker keys are used when pulling custom docker environments from private DockerHub repositories.

tip

Do not use your regular DockerHub account credentials. Instead, generate an Access Token for trainML and provide that so it can be easily scoped and revoked.

Create a new Docker Access Token from the account that has access to the private repository you wish to use by following the instructions here. If you have a Pro or Team plan, trainML recommends that you create the token with Read Only access permissions.

Once the access token is created, go back to the trainML third-party key configuration page, and select Docker from the Add menu under Third-Party Keys. Enter your DockerHub username as the Key ID and the generated access token as the Key Secret and click the check button.

GCP

tip

Never provide trainML (or anyone for that matter) credentials for an Google service account with admin privileges.

Create a new service account in the GCP project that contains the data or services you want the trainML platform to interact with. When creating the account, ensure you configure permissions very narrowly and allow access only to the specific data or services needed for the model training process. For example, if you wanted to download the data from the /data path of the input-data-bucket bucket. You should assign the Storage Object Viewer role with a condition of type Name and operator of Starts With and a value of projects/_/buckets/input-data-bucket/objects/data/. If you want to upload data to the /results path of the artifacts-bucket, You should assign the Storage Object Viewer role with a condition of type Name and operator of Starts With and a value of projects/_/buckets/artifacts-bucket, as well as assign the Storage Object Creator role with a condition of type Name and operator of Starts With and a value of projects/_/buckets/artifacts-bucket/objects/results/. The reason full read access is required for the output bucket is because gsutil requires bucket level read access in order to copy objects. For more details about condition and resource names on buckets and objects, review the GCP documentation.

Once the service account is created, create and download the service account key JSON file. Go to the trainML third-party key configuration page and select GCP from the Add menu under Third-Party Keys. Click the Upload Json File button, select the file you JSON file you downloaded, and click the check button.

In order to use the GCP keys to access services from a worker, you must first activate them in the job environment by including the following command in your script prior to accessing GCP services:

gcloud auth activate-service-account --key-file ${GOOGLE_APPLICATION_CREDENTIALS}

Alternatively, if you're using the Python SDK directly, you can activate the service account credentials using the from_service_account_json function and specify the location of the key file using the environment variable GOOGLE_APPLICATION_CREDENTIALS.

Git

If you want to run jobs using private git repositories, you must create an SSH key for the trainML platform to use when connecting to your repository. Click the Generate button to create a new key. Once the key is created, copy the entire public key starting from ssh-ed25519 and up to and including support@trainml.ai and attach this key user in your private repository. For example, the instructions for adding a SSH key to your Github account can be found here.

Hugging Face

To integrate with Hugging Face, first create a User Access Token with these instructions. If you plan to only download data, create a read token. If you plan to upload results back to huggingface, create a write token. Once you have the token, go back to the trainML third-party key configuration page, and select Hugging Face from the Add menu under Third-Party Keys. Enter the your Hugging Face account name as the Key ID and the generated token as the Key Secret and click the check button.

Kaggle

To enable Kaggle integration in the trainML platform, you must first generate a Kaggle API token. Instructions to generate a new token can be found here. If you are already using the Kaggle CLI tool on your local computer, the API token is usually located at $HOME/.kaggle/kaggle.json.

Once you have the kaggle.json file for your account, Go to the trainML third-party key configuration page and select Kaggle from the Add menu under Third-Party Keys. Click the Upload Json File button, select the file you JSON file you downloaded, and click the check button. If the file is successfully uploaded, you should see Credentials File: kaggle.json next to the trophy icon.

NVIDIA NGC

NVIDIA NGC keys are used for pulling NVIDIA maintained or private registry images from the NVIDIA NGC.

Create an API key for your NGC account that has access to the images you wish to use. Once you have the API key, go back to the trainML third-party key configuration page, and select NVIDIA NGC from the Add menu under Third-Party Keys. Enter the API key in the NGC API Key field and click the check button.

Wasabi

A recommended but not required first step is to create a policy that will restrict access for the trainML integration to just the buckets and bucket paths required. Create a new policy using the Wasabi Documention. An example policy the restricts both reading and writing to a specific bucket path is the following:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<name of bucket with data>"
},
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::<name of bucket with data>/path/to/data/*"
},
{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<name of bucket with data>/path/to/outputs/*"
}
]
}

Once you have built the policy for the integration, create a new user for the trainML platform to use. Follow the Wasabi documentation for this step. Create the user with Programmatic access only, and select the policy you created on step 3. Once the user is created, it will prompt you to download the access keys.

Go to the trainML third-party key configuration page and select Wasabi from the Add menu under Third-Party Keys. Input the Access Key ID and the Secret Key you just downloaded to the relevant fields and click the check button.