Skip to main content

Azure Blob Storage and Container Registry Integration

· 3 min read

Integration with Azure Blob Storage and Azure Container Registry is now available natively in trainML.

How It Works

tip

Never provide trainML (or anyone for that matter) credentials for user with admin privileges.

Create a new service principal for trainML in the Azure account that contains the data or services you want the trainML platform to interact with. In the app registration, select Single Tenant as the Supported Account Type and leave the Redirect URI unconfigured. On the App Registrations overview page for the newly created app, locate and note the Application (client) ID and Directory (tenant) ID fields for later. To finish generating the credentials, create an application secret for the application by following the instructions here.

Once the service principal credentials have been obtained, grant access to the principal by attaching one or more roles to it in the Access Control (IAM) section of the Azure Subscription. Instructions can be found here. To allow trainML access to Azure Blob Storage, you need to assign the Storage Blob Data Contributor role. To utilize the Azure Container Registry for private job images, you need to assign the AcrPull role.

trainML recommends that you add custom conditions to the role assignments to further restrict access to data within your account. For example to restrict access to a specific storage account container and write access to a specific path within that container, add the following condition to the role assignment:

(
ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'}
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEqualsIgnoreCase '<name of storage account container with data>'
)
OR
(
ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write'}
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers/blobs:path] StringStartsWith 'path/to/output/data'
AND
@resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEqualsIgnoreCase '<name of storage account container with data>'
)

Once the service principal configured, go back to the trainML third-party key configuration page, and select Azure from the Add menu under Third-Party Keys. Copy the Client ID, Tenant ID, and Secret from the previous steps to the relevant fields and click the check button.

Using the Web Platform

Once configured, you can use your Azure credentials in all the same places as AWS or GCP credentials.

Using the SDK

To run an inference job using an image stored in Azure Container Registry, with a model and input data from Azure Blob Storage that saves the output back to Azure Blob Storage, use the following example syntax:

job = await trainml.jobs.create(
name="System Tests - Inference Job Azure Data",
type="inference",
...
data=dict(
input_type="azure",
input_uri="https://exampleaccount.blob.core.windows.net/example-container/data/cifar-10",
output_type="azure",
output_uri="https://exampleaccount.blob.core.windows.net/example-container/output/resnet-cifar10/",
),
model=dict(
source_type="azure",
source_uri="https://exampleaccount.blob.core.windows.net/example-container/model/environment-tests",
),
environment=dict(
type="CUSTOM",
custom_image="examplerepo.azurecr.io/tensorflow:2.10.0-gpu",
worker_key_types=["azure"],
),
...
)