trainML Documentation
  • Docs
  • Tutorials
  • Blog
  • Login/Signup

›All Blog Posts

All Blog Posts

  • CloudBender
  • NVIDIA NGC Catalog Integration
  • Collaborative Projects for Resource Sharing
  • Customer Provided Job Environments
  • REST Endpoints for Inference
  • Automatic Dependency Installation
  • Consolidated Account Billing
  • Load Model Code Directly From Your Laptop
  • Start Training Models With One Line
  • RTX 3090 (BFGPU) Instances Now Available
  • Build Full Machine Learning Pipelines with trainML Inference Jobs
  • Store Training Results Directly on the trainML Platform
  • Dataset Viewing
  • Stay Modern with Python 3.8 Job Environments
  • Downloadable Log Extracts for Jobs and Datasets
  • Automate Training with the trainML Python SDK
  • trainML Jobs on Google Cloud Platform Instances
  • Spawn Training Jobs Directly From Notebooks
  • Easy Notebook Forking For Rapid Experimentation
  • Making Datasets More Flexible and Expanding Environment Options
  • Kaggle Datasets and API Integration
  • Centralized, Real-Time Training Job Worker Monitoring
  • Free to Use Public Datasets
  • Major UI Overhaul and Direct Notebook Access
  • Load Data Once, Reuse Infinitely
  • Serverless Deep Learning On Private Git Repositories
  • Google Cloud Storage Integration Released
  • Skip the Cloud Data Transfers with Local Storage
  • Web (HTTP/FTP) Data Downloads Plus Auto-Extraction of Archives

Automate Training with the trainML Python SDK

January 24, 2021

trainML

trainML

trainML jobs and datasets can now be created, managed, and monitored programmatically using our Python SDK.

How It Works

Install the trainML PyPi package into a python virtual environment using Python 3.8 or above:

pip install trainml

Authentication

In order to use the SDK, you must generate API keys for your trainML account and supply them to the SDK. To create new API keys, go to the account settings page and click the Create button in the API Keys section. This will automatically download a credentials.json file. This file can only be generated once per API key.

Treat this file as a password, as anyone with access to your API key will have the ability to create and control resources in your trainML account.

You can deactivate any API key by clicking the Remove button.

The easiest way to authenticate is to place the credentials file downloaded into the .trainml folder of your home directory and ensure only you have access to it. From the directory that the credentials.json file was downloaded, run the following command:

mkdir -p ~/.trainml
mv credentials.json ~/.trainml/credentials.json
chmod 600 ~/.trainml/credentials.json

For more ways to configure authentication, review the readme file on Github.

Usage

The trainML SDK utilizes the asyncio library to ease the concurrent execution of long running tasks. An example of how to create a dataset from an S3 bucket and immediately run a training job on that dataset is the following:

from trainml.trainml import TrainML
import asyncio


trainml_client = TrainML()

# Create the dataset
dataset = asyncio.run(
    trainml_client.datasets.create(
        name="Example Dataset",
        source_type="aws",
        source_uri="s3://trainml-examples/data/cifar10",
    )
)

print(dataset)

# Watch the log output, attach will return when data transfer is complete
asyncio.run(dataset.attach())

# Create the job
job = asyncio.run(
    trainml_client.jobs.create(
        name="Example Training Job",
        type="training",
        gpu_type="GTX 1060",
        gpu_count=1,
        disk_size=10,
        workers=[
            "PYTHONPATH=$PYTHONPATH:$TRAINML_MODEL_PATH python -m official.vision.image_classification.resnet_cifar_main --num_gpus=1 --data_dir=$TRAINML_DATA_PATH --model_dir=$TRAINML_OUTPUT_PATH --enable_checkpoint_and_export=True --train_epochs=10 --batch_size=1024",
        ],
        data=dict(
            datasets=[dict(id=dataset.id, type="existing")],
            output_uri="s3://trainml-examples/output/resnet_cifar10",
            output_type="aws",
        ),
        model=dict(git_uri="git@github.com:trainML/test-private.git"),
    )
)
print(job)

# Watch the log output, attach will return when the training job stops
asyncio.run(job.attach())

# Cleanup job and dataset
asyncio.run(job.remove())
asyncio.run(dataset.remove())

For more examples of how to use the SDK to create, monitor, and remove jobs and datasets, review the examples provided here

Tweet
Recent Posts
  • How It Works
    • Authentication
    • Usage
trainML Documentation
Docs
Getting StartedTutorials
Legal
Privacy PolicyTerms of Use
Copyright © 2022 trainML, LLC, All rights reserved