Skip to main content

Centralized, Real-Time Training Job Worker Monitoring

· 2 min read

Training jobs' worker log output can now be viewed centrally from the trainML platform in real-time. Keep an eye on all your job workers' training progress at the same time, so you can stop them early if they are no longer making progress.

How It Works

All training jobs now have a View button under Actions. This button takes you to the execution logs page for that job. If the job is currently running and emitting log messages to stdout or stderr, the log messages will automatically appear as they are emitted in descending order (the most recent log message is always on top). If the job has more than one worker, you will see the the log messages for all workers simultaneously in one grid. If you want to see the logs for just one specific worker, you can filter the view by selection a worker from the dropdown at the top of the grid.

When you first open this page, the most recent log messages are also loaded before new messages being streaming. If the job has a single worker, you can scroll down on the page to retrieve older log messages. When you reach the oldest log message stored, you will see a message stating There are no older results. If your the has more than one worker, you must select a single worker in order to retrieve older logs messages.

Job execution logs remain available for 7 days after the log message was emitted and are still visible during that time window even when the job is stopped.