Data model

TorchIO's data model has five core classes: Image, Points, BoundingBoxes, Subject, and AffineMatrix. This article explains what each one does and how they relate.

Overview

classDiagram
    class Image {
        +data: Tensor (C,I,J,K)
        +affine: AffineMatrix
        +load() / save()
    }

    class ScalarImage {
        Intensity data
    }

    class LabelMap {
        Segmentation labels
    }

    class Points {
        +data: Tensor (N,3)
        +affine: AffineMatrix
        +to_world()
    }

    class BoundingBoxes {
        +data: Tensor (N,6)
        +format: BoundingBoxFormat
        +labels: Tensor | None
        +to_format()
    }

    class Subject {
        +images
        +points
        +bounding_boxes
        +metadata
    }

    class AffineMatrix {
        +spacing
        +origin
        +direction
        +orientation
        +inverse()
        +compose(other)
        +apply(points)
    }

    Image <|-- ScalarImage
    Image <|-- LabelMap
    Image --> AffineMatrix : has
    Points --> AffineMatrix : has
    BoundingBoxes --> AffineMatrix : has
    Subject --> Image : contains
    Subject --> Points : contains
    Subject --> BoundingBoxes : contains

Image

An Image represents a single 3D (or multi-channel 3D) medical image. It stores:

A 4D tensor with shape \((C, I, J, K)\): channels, then three spatial dimensions.
An affine matrix mapping voxel indices \((i, j, k)\) to world coordinates \((x, y, z)\) in millimeters.
Optional metadata passed as keyword arguments.

Images are lazy: data is not read from disk until first accessed. This means you can create thousands of Image objects cheaply and only load what you need.

Images can be created from multiple sources:

image = tio.ScalarImage("t1.nii.gz")              # from file (lazy)
image = tio.ScalarImage("s3://bucket/t1.nii.gz")   # from cloud (via fsspec)
image = tio.ScalarImage("https://example.com/t1.nii.gz")  # from URL
image = tio.ScalarImage(buf, suffix=".nii.gz")     # from file-like object
image = tio.ScalarImage(tensor)                    # from PyTorch tensor
image = tio.ScalarImage(sitk_image)                # from SimpleITK
image = tio.ScalarImage(nifti_image)               # from NiBabel (lazy)
image = tio.ScalarImage(raw_bytes)                 # from bytes or BytesIO
image = tio.ScalarImage(zarr_store)                # from zarr Store (lazy)

Metadata

Any extra keyword argument is stored as metadata and accessible by attribute or dict-style lookup:

image = tio.ScalarImage("t1.nii.gz", protocol="MPRAGE", te=3.5)
image.protocol       # "MPRAGE"
image["te"]          # 3.5
image.metadata       # {"protocol": "MPRAGE", "te": 3.5}

ScalarImage vs LabelMap

ScalarImage and LabelMap are subclasses of Image. They carry no extra data. The distinction is purely semantic:

ScalarImage: continuous intensity data (MRI signal, CT Hounsfield units, PET SUV).
LabelMap: discrete segmentation labels (0 = background, 1 = tumor, etc.).

Transforms use isinstance checks to decide behavior. For example, spatial transforms use linear interpolation for ScalarImage and nearest-neighbor for LabelMap.

Tensor layout

TorchIO uses the convention (C, I, J, K):

Axis	Meaning	Example
`C`	Channels	Gradient directions in DWI, components in a vector field
`I`	First spatial axis	Left-Right (in RAS)
`J`	Second spatial axis	Posterior-Anterior (in RAS)
`K`	Third spatial axis	Inferior-Superior (in RAS)

Most single-channel images (T1, CT, etc.) have C = 1.

AffineMatrix

The AffineMatrix class wraps a \(4 \times 4\) matrix that maps voxel indices to world coordinates:

\[ \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \mathbf{A} \begin{bmatrix} i \\ j \\ k \\ 1 \end{bmatrix} \]

It provides named access to the components people usually care about:

spacing: voxel size in mm, derived from the column norms of the rotation-zoom block.
origin: world coordinates of the voxel at index \((0, 0, 0)\).
direction: \(3 \times 3\) rotation matrix (spacing factored out).
orientation: anatomical axis codes like ('R', 'A', 'S').

Affines compose via the @ operator:

combined = affine_a @ affine_b

Points

A Points object stores an \((N, 3)\) tensor of 3D coordinates in voxel space, together with an affine for converting to world coordinates:

import torch
import torchio as tio

landmarks = tio.Points(
    torch.tensor([[128.0, 100.0, 90.0], [128.0, 130.0, 90.0]]),
    affine=image.affine,
)
world = landmarks.to_world()  # (N, 3) in mm

Use cases include anatomical landmarks, fiducial markers, and seed points.

BoundingBoxes

BoundingBoxes stores an \((N, 6)\) tensor of 3D bounding boxes. Inspired by torchvision.tv_tensors.BoundingBoxes, extended to three dimensions. Two formats are supported:

The format is parameterised by axes and representation:

Axes: any permutation of IJK (voxel) or any valid anatomical triplet like RAS, LPI, etc.
Representation: corners (two opposite corners) or center_size (center + extent along each axis).

Predefined	Axes	Representation
`IJKIJK`	`IJK`	corners: \((i_1, j_1, k_1, i_2, j_2, k_2)\)
`IJKWHD`	`IJK`	center + size: \((i_c, j_c, k_c, s_i, s_j, s_k)\)

Custom formats are created with BoundingBoxFormat("RAS", "corners"), etc.

Convert between formats with to_format(). This handles representation changes, axis permutations, and even voxel ↔ anatomical conversions (using the stored affine). Optionally attach an integer labels tensor to track per-box class IDs.

Subject (a.k.a. Study)

A Subject groups images, points, bounding boxes, and metadata belonging to one imaging session:

subject = tio.Subject(
    t1=tio.ScalarImage("t1.nii.gz"),
    seg=tio.LabelMap("seg.nii.gz"),
    landmarks=tio.Points(torch.randn(5, 3)),
    tumors=tio.BoundingBoxes(
        torch.tensor([[10, 20, 30, 50, 60, 70]]),
        format=tio.BoundingBoxFormat.IJKIJK,
    ),
    age=45,
)

Study is an alias for Subject. Both refer to the same class. In DICOM terminology, a "study" contains "series" (volumes), which maps directly to this container. Neuroscience users tend to think in "subjects", radiology users in "studies":

study = tio.Study(t1=tio.ScalarImage("t1.nii.gz"), patient_id="abc")

Contents are classified automatically by type:

Image instances go to subject.images
Points instances go to subject.points
BoundingBoxes instances go to subject.bounding_boxes
Everything else is metadata, accessible via subject.metadata

All entries are accessible by name:

subject.t1          # the ScalarImage
subject.landmarks   # the Points
subject.tumors      # the BoundingBoxes
subject.age         # 45

The Subject checks consistency across images. For example, subject.spatial_shape raises an error if the images have different spatial shapes.

How they fit together

flowchart LR
    subgraph Subject
        T1[ScalarImage<br/>t1.nii.gz]
        SEG[LabelMap<br/>seg.nii.gz]
        LM["Points<br/>landmarks"]
        BB["BoundingBoxes<br/>tumors"]
        META["age: 45"]
    end

    T1 -->|".affine"| A1[AffineMatrix<br/>spacing, origin,<br/>direction]
    SEG -->|".affine"| A2[AffineMatrix]
    LM -->|".affine"| A3[AffineMatrix]
    T1 -->|".data"| D1["Tensor (C, I, J, K)"]
    SEG -->|".data"| D2["Tensor (C, I, J, K)"]
    LM -->|".data"| D3["Tensor (N, 3)"]
    BB -->|".data"| D4["Tensor (N, 6)"]

A typical workflow:

Create Image objects from file paths (lazy, no data read).
Create Points or BoundingBoxes from annotations.
Group them into a Subject.
Apply transforms to the Subject. This triggers loading and produces a new Subject with transformed data.
Access .data tensors for training.

Batching

When training a model, you need to stack subjects into batches. SubjectsLoader returns SubjectsBatch instances where each image entry is an ImagesBatch with a 5D tensor (B, C, I, J, K):

loader = tio.SubjectsLoader(dataset, batch_size=4)
for batch in loader:
    batch.t1.data.shape  # (4, C, I, J, K)
    batch.metadata["age"]  # [42, 35, 60, 28]

Each ImagesBatch stores per-sample affine matrices, so subjects with different spatial properties batch correctly.

Transforms work directly on batches. By default, transforms that support it sample independent parameters per batch element (see Per-instance augmentation):

augmented = tio.Flip(axes=(0,))(batch)