Wasabi cloud storage has been added as an available storage integration. Wasabi can save you up to 80% on persistent storage compared to AWS and has no additional egress/API fees, making it a great option for trainML integration.
How It Works
To use the Wasabi storage integration, you must first configure a Wasabi Third-Party Key. A recommended first step is to create a policy that will restrict access for the trainML integration to just the buckets and bucket paths required. Create a new policy using the Wasabi Documention. An example policy the restricts both reading and writing to a specific bucket path is the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<name of bucket with data>"
},
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::<name of bucket with data>/path/to/data/*"
},
{
"Effect": "Allow",
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::<name of bucket with data>/path/to/outputs/*"
}
]
}
Once you have built the policy for the integration, create a new user for the trainML platform to use. Follow the Wasabi documentation for this step. Create the user with Programmatic
access only, and select the policy you created on step 3. Once the user is created, it will prompt you to download the access keys.
Go to the trainML third-party key configuration page and select Wasabi
from the Add
menu under Third-Party Keys. Input the Access Key ID and the Secret Key you just downloaded to the relevant fields and click the check button.
Once the key has been added, you can now use your Wasabi buckets to load data to create Datasets, Models, or Checkpoints, load input data for inference jobs, or save training or inference job outputs.
Using the Web Platform
To use Wasabi as both the input and output data locations for an inference job, create a new job from the Inference Job Dashboard and select Wasabi
as the input type. Specify the bucket path for the input data in the Input Storage Path
field. Enter the service URL for the Wasabi region where your bucket located from the list here. Do the same for the output data, click Next
to review and Create
to start the job.
Using the SDK
To use Wasabi as both the input and output data locations for an inference job, use the following syntax:
job = await trainml.jobs.create(
name="Wasabi Inference",
type="inference",
...
data=dict(
input_type="wasabi",
input_uri="s3://example-bucket/input/cifar-10",
input_options=dict(endpoint_url="https://s3.wasabisys.com"),
output_type="wasabi",
output_uri="s3://example-bucket/output/resnet_cifar10",
output_options=dict(
endpoint_url="https://s3.wasabisys.com"
),
),
...
)
To create a dataset from data stored in wasabi, use the following syntax:
dataset = await trainml.datasets.create(
name="Wasabi Dataset",
source_type="wasabi",
source_uri="s3://example-bucket/input/cifar-10",
source_options=dict(endpoint_url="https://s3.wasabisys.com"),
)
assert dataset.id
dataset = await dataset.wait_for("ready")