Data model
TorchIO's data model has five core classes: Image, Points, BoundingBoxes, Subject, and AffineMatrix. This article explains what each one does and how they relate.
Overview
classDiagram
class Image {
+data: Tensor (C,I,J,K)
+affine: AffineMatrix
+load() / save()
}
class ScalarImage {
Intensity data
}
class LabelMap {
Segmentation labels
}
class Points {
+data: Tensor (N,3)
+affine: AffineMatrix
+to_world()
}
class BoundingBoxes {
+data: Tensor (N,6)
+format: BoundingBoxFormat
+labels: Tensor | None
+to_format()
}
class Subject {
+images
+points
+bounding_boxes
+metadata
}
class AffineMatrix {
+spacing
+origin
+direction
+orientation
+inverse()
+compose(other)
+apply(points)
}
Image <|-- ScalarImage
Image <|-- LabelMap
Image --> AffineMatrix : has
Points --> AffineMatrix : has
BoundingBoxes --> AffineMatrix : has
Subject --> Image : contains
Subject --> Points : contains
Subject --> BoundingBoxes : contains
Image
An Image represents a single 3D (or multi-channel 3D) medical image.
It stores:
- A 4D tensor with shape \((C, I, J, K)\): channels, then three spatial dimensions.
- An affine matrix mapping voxel indices \((i, j, k)\) to world coordinates \((x, y, z)\) in millimeters.
- Optional metadata passed as keyword arguments.
Images are lazy: data is not read from disk until first accessed.
This means you can create thousands of Image objects cheaply and
only load what you need.
Images can be created from multiple sources:
image = tio.ScalarImage("t1.nii.gz") # from file (lazy)
image = tio.ScalarImage("s3://bucket/t1.nii.gz") # from cloud (via fsspec)
image = tio.ScalarImage("https://example.com/t1.nii.gz") # from URL
image = tio.ScalarImage(buf, suffix=".nii.gz") # from file-like object
image = tio.ScalarImage(tensor) # from PyTorch tensor
image = tio.ScalarImage(sitk_image) # from SimpleITK
image = tio.ScalarImage(nifti_image) # from NiBabel (lazy)
image = tio.ScalarImage(raw_bytes) # from bytes or BytesIO
image = tio.ScalarImage(zarr_store) # from zarr Store (lazy)
Metadata
Any extra keyword argument is stored as metadata and accessible by attribute or dict-style lookup:
image = tio.ScalarImage("t1.nii.gz", protocol="MPRAGE", te=3.5)
image.protocol # "MPRAGE"
image["te"] # 3.5
image.metadata # {"protocol": "MPRAGE", "te": 3.5}
ScalarImage vs LabelMap
ScalarImage and LabelMap are subclasses of Image. They carry no
extra data. The distinction is purely semantic:
- ScalarImage: continuous intensity data (MRI signal, CT Hounsfield units, PET SUV).
- LabelMap: discrete segmentation labels (0 = background, 1 = tumor, etc.).
Transforms use isinstance checks to decide behavior. For example,
spatial transforms use linear interpolation for ScalarImage and
nearest-neighbor for LabelMap.
Tensor layout
TorchIO uses the convention (C, I, J, K):
| Axis | Meaning | Example |
|---|---|---|
C |
Channels | Gradient directions in DWI, components in a vector field |
I |
First spatial axis | Left-Right (in RAS) |
J |
Second spatial axis | Posterior-Anterior (in RAS) |
K |
Third spatial axis | Inferior-Superior (in RAS) |
Most single-channel images (T1, CT, etc.) have C = 1.
AffineMatrix
The AffineMatrix class wraps a \(4 \times 4\) matrix that maps voxel indices
to world coordinates:
It provides named access to the components people usually care about:
spacing: voxel size in mm, derived from the column norms of the rotation-zoom block.origin: world coordinates of the voxel at index \((0, 0, 0)\).direction: \(3 \times 3\) rotation matrix (spacing factored out).orientation: anatomical axis codes like('R', 'A', 'S').
Affines compose via the @ operator:
Points
A Points object stores an \((N, 3)\) tensor of 3D coordinates in voxel
space, together with an affine for converting to world coordinates:
import torch
import torchio as tio
landmarks = tio.Points(
torch.tensor([[128.0, 100.0, 90.0], [128.0, 130.0, 90.0]]),
affine=image.affine,
)
world = landmarks.to_world() # (N, 3) in mm
Use cases include anatomical landmarks, fiducial markers, and seed points.
BoundingBoxes
BoundingBoxes stores an \((N, 6)\) tensor of 3D bounding boxes.
Inspired by torchvision.tv_tensors.BoundingBoxes, extended to three
dimensions. Two formats are supported:
The format is parameterised by axes and representation:
- Axes: any permutation of
IJK(voxel) or any valid anatomical triplet likeRAS,LPI, etc. - Representation: corners (two opposite corners) or center_size (center + extent along each axis).
| Predefined | Axes | Representation |
|---|---|---|
IJKIJK |
IJK |
corners: \((i_1, j_1, k_1, i_2, j_2, k_2)\) |
IJKWHD |
IJK |
center + size: \((i_c, j_c, k_c, s_i, s_j, s_k)\) |
Custom formats are created with
BoundingBoxFormat("RAS", "corners"), etc.
Convert between formats with to_format(). This handles
representation changes, axis permutations, and even voxel ↔ anatomical
conversions (using the stored affine). Optionally attach an integer
labels tensor to track per-box class IDs.
Subject (a.k.a. Study)
A Subject groups images, points, bounding boxes, and metadata
belonging to one imaging session:
subject = tio.Subject(
t1=tio.ScalarImage("t1.nii.gz"),
seg=tio.LabelMap("seg.nii.gz"),
landmarks=tio.Points(torch.randn(5, 3)),
tumors=tio.BoundingBoxes(
torch.tensor([[10, 20, 30, 50, 60, 70]]),
format=tio.BoundingBoxFormat.IJKIJK,
),
age=45,
)
Study is an alias for Subject. Both refer to the same class.
In DICOM terminology, a "study" contains "series" (volumes), which
maps directly to this container. Neuroscience users tend to think
in "subjects", radiology users in "studies":
Contents are classified automatically by type:
Imageinstances go tosubject.imagesPointsinstances go tosubject.pointsBoundingBoxesinstances go tosubject.bounding_boxes- Everything else is metadata, accessible via
subject.metadata
All entries are accessible by name:
subject.t1 # the ScalarImage
subject.landmarks # the Points
subject.tumors # the BoundingBoxes
subject.age # 45
The Subject checks consistency across images. For example,
subject.spatial_shape raises an error if the images have different
spatial shapes.
How they fit together
flowchart LR
subgraph Subject
T1[ScalarImage<br/>t1.nii.gz]
SEG[LabelMap<br/>seg.nii.gz]
LM["Points<br/>landmarks"]
BB["BoundingBoxes<br/>tumors"]
META["age: 45"]
end
T1 -->|".affine"| A1[AffineMatrix<br/>spacing, origin,<br/>direction]
SEG -->|".affine"| A2[AffineMatrix]
LM -->|".affine"| A3[AffineMatrix]
T1 -->|".data"| D1["Tensor (C, I, J, K)"]
SEG -->|".data"| D2["Tensor (C, I, J, K)"]
LM -->|".data"| D3["Tensor (N, 3)"]
BB -->|".data"| D4["Tensor (N, 6)"]
A typical workflow:
- Create
Imageobjects from file paths (lazy, no data read). - Create
PointsorBoundingBoxesfrom annotations. - Group them into a
Subject. - Apply transforms to the
Subject. This triggers loading and produces a newSubjectwith transformed data. - Access
.datatensors for training.
Batching
When training a model, you need to stack subjects into batches.
SubjectsLoader returns SubjectsBatch instances where each
image entry is an ImagesBatch with a 5D tensor (B, C, I, J, K):
loader = tio.SubjectsLoader(dataset, batch_size=4)
for batch in loader:
batch.t1.data.shape # (4, C, I, J, K)
batch.metadata["age"] # [42, 35, 60, 28]
Each ImagesBatch stores per-sample affine matrices, so subjects
with different spatial properties batch correctly.
Transforms work directly on batches. By default, transforms that support it sample independent parameters per batch element (see Per-instance augmentation):