Working with large volumes
Medical images can be very large. A single high-resolution MRI might occupy several gigabytes. This tutorial shows how to work with such volumes efficiently using TorchIO's lazy loading and backend system.
The problem
Loading a 724 x 868 x 724 float32 volume allocates ~1.8 GB of RAM (and can take many seconds to read and decompress). If you only need a small region, that is wasteful.
Lazy slicing
TorchIO images are lazy by default. Slicing a lazy image reads only the requested region through the backend, without loading the full volume into memory:
import torchio as tio
image = tio.ScalarImage("huge_volume.nii.gz")
# This is fast: reads only a 10x10x10 patch
patch = image[:, 100:110, 100:110, 100:110]
print(patch.data.mean())
# The original image was never fully loaded
print(image.is_loaded) # False
For uncompressed .nii (memory-mapped) and .nii.zarr (chunked), this can be
orders of magnitude faster than a full load. For .nii.gz the gain is more
modest, because gzip must be decompressed from the start (see below).
File format comparison
Not all formats are equally efficient for partial reads:
| Format | Extension | Partial I/O | Notes |
|---|---|---|---|
| Uncompressed NIfTI | .nii |
Memory-mapped | Best for local random access |
| Compressed NIfTI | .nii.gz |
Buffered by nibabel | Modest speedup for small regions; gzip has no true random access |
| NIfTI-Zarr | .nii.zarr |
Chunked reads | Best for very large volumes and remote storage |
NIfTI-Zarr
NIfTI-Zarr stores data in independently compressed chunks. Reading one chunk does not require decompressing any other. This is ideal for:
- Volumes too large to fit in memory
- Remote storage (S3, GCS) where you want to fetch only what you need
- Collaborative workflows where different users need different regions
Converting to NIfTI-Zarr
Reading from NIfTI-Zarr
image = tio.ScalarImage("volume.nii.zarr")
print(image.shape) # reads only metadata
# Read a small region: only the overlapping chunks are decompressed
patch = image[:, 50:60, 50:60, 50:60]
The backend system
TorchIO uses a pluggable backend system to support different storage
formats. The choice is not hard-coded in Image: a resolver consults a
registry of backends, so new formats can be added without modifying Image
(see Lazy loading and backends).
flowchart TD
I["Image(path)"] --> check{Resolver}
check -->|".nii"| NB[NibabelBackend<br/>memory-mapped]
check -->|".nii.gz"| NB
check -->|".nii.zarr"| ZB[ZarrBackend<br/>chunked reads]
check -->|other| SITK[SimpleITK reader<br/>full load]
NB -->|"image.dataobj[slices]"| partial[Partial read]
ZB -->|"image.dataobj[slices]"| partial
NB -->|"image.data"| full[Full tensor]
ZB -->|"image.data"| full
SITK -->|"image.data"| full
image.data: materializes the full tensor (triggers load if needed)image.dataobj: returns the lazy backend for advanced slicing;image.dataobj[slices]returns a 4D(C, I, J, K)torch.Tensorimage[slices]: uses the backend automatically, returns a new image
Tips
- Use uncompressed
.niifor local training: memory-mapping gives true random access with zero decompression cost. - Use
.nii.zarrfor large shared volumes: chunked storage means each worker reads only what it needs. - Avoid
.nii.gzfor random access: gzip is a stream format. Nibabel handles it well for small slices, but for repeated random access, convert to.niior.nii.zarr. - Slice before loading:
image[:, 100:200].datais much cheaper thanimage.data[:, 100:200].