Examples
Basic submission
Submit a training script to Azure ML:
With a specific experiment name and run name:
Pass extra arguments to the script after --:
With a specific experiment name and run name:
Pass extra arguments to the script:
Choosing a compute target
Multi-node training
MPI (default)
When --num-gpus is not set, MPI distribution is used:
PyTorch distributed
Set --num-gpus to enable
PyTorchDistribution:
This configures 4 processes per node across 2 nodes (8 GPUs total).
Sweep jobs
Run a grid sweep over hyperparameters:
Limit concurrent trials:
Data mounting
Datasets are passed to the job as
Input
objects. There is one flag per source type (registered data asset, datastore
folder, or previous job output), in either --mount-* or --download-* form.
Mount or download a registered data asset:
Mount a folder directly from a datastore (no data-asset registration required):
Use the outputs of a previous job:
submit_to_aml(
script_path="train.py",
mount_asset=["data=MY-DATASET:2"],
)
# Or download instead of mount
submit_to_aml(
script_path="train.py",
download_asset=["data=MY-DATASET"],
)
# Mount a datastore folder directly
submit_to_aml(
script_path="train.py",
mount_datastore=["ref=mystore/exports/reference"],
)
# Use outputs from a previous job
submit_to_aml(
script_path="evaluate.py",
mount_job=["checkpoint=my-training-job:models/best.pth"],
)
Deprecated flags
The --mount, --download and --output flags (and their
datasets_mount, datasets_download and datasets_output Python
equivalents) are deprecated in favour of the explicit per-source flags
above. They still work but emit a deprecation warning.
Write outputs to a datastore folder, or register them as a data asset:
Environment management
Docker build context (default)
By default, submit-aml builds a Docker context from your project's
pyproject.toml, uv.lock, and .python-version:
Custom Docker image
You can use any image, including ones from the Azure ML containers repo:
Existing Azure ML environment
Conda environment
Setting environment variables
Debugging
Enable remote debugging with debugpy:
This installs debugpy, starts the script with a debug listener on port 5678,
and adds a VS Code service for remote connection.
Dry run
Preview the job configuration without submitting:
Stream logs
Submit and wait for the job to complete, streaming logs: