Lazy loading and backends
TorchIO images are lazy by default: creating an Image from a file
path, a NiBabel image, or a zarr Store reads nothing from disk. This
article explains when data actually enters memory and how the backend
system works.
The backends are a lazy I/O layer, not a lazy computation framework.
They speed up metadata reads (shape, affine, dtype) and region slicing, but
they do not defer arithmetic or transforms. Once you access .data, apply a
transform, or build a batch, the full tensor is materialized in memory (see
When tensors are materialized).
When is data loaded?
stateDiagram-v2
[*] --> Lazy: Image(path) / Image(nifti) / Image(store)
Lazy --> BackendReady: .shape / .affine / .dataobj / [slicing]
Lazy --> Loaded: .data / .load()
BackendReady --> Loaded: .data / .load()
Loaded --> Loaded: .data (cached)
note right of Lazy: No I/O yet
note right of BackendReady: Header read,<br/>data not in memory
note right of Loaded: Full tensor<br/>in memory
| Access | What happens |
|---|---|
Image(path) |
Nothing. Stores the path. |
Image(nifti_image) |
Nothing. Stores a reference to the nibabel object. |
Image(zarr_store) |
Nothing. Stores the store reference. |
Image(tensor) |
Immediate. The tensor is already in memory. |
.shape |
Creates a backend and reads the header. No data loaded. |
.spacing, .affine |
Same: reads header via backend. |
image[slices] |
Reads only the sliced region through the backend. Parent image stays unloaded. |
.data |
Loads the full tensor into memory. Cached for subsequent access. |
.dataobj |
Returns the raw backend for advanced use. |
Backends
A backend is a lazy I/O adapter: an object that gives Image uniform
access to one image's data, wherever it lives, without loading the whole
volume. Intuitively, a backend is "a thing that can answer five questions",
which is exactly its contract:
shape-> always(C, I, J, K)affine-> the 4x4 voxel-to-world matrixdtype-> the on-disk (or in-memory) data typebackend[region]-> read just that region, as a 4D tensorto_tensor()-> materialize the whole volume
The first three are cheap header reads; the last two are where pixel data is
actually read. Each backend normalizes its storage-specific layout (e.g. a
NIfTI's (I, J, K) or (I, J, K, C)) into TorchIO's (C, I, J, K).
Because "backend" is an overloaded word, it helps to say what this is not:
- Not a compute backend: it has nothing to do with
torchdevices or kernels. - Not the storage format itself: it is the adapter to a format. Formats are mapped to backends by a resolver (below).
- Not a lazy computation framework: it defers reads, not arithmetic. Transforms, batching, and queues still materialize tensors (see When tensors are materialized).
TorchIO does not hard-code the choice in Image: it passes a description of the
source (a BackendRequest) to a small resolver, which consults a registry
of backends in order and returns the first match:
| Backend | Format | How it works |
|---|---|---|
NibabelBackend |
.nii, .nii.gz, nib.Nifti1Image |
Wraps nibabel's ArrayProxy. Uncompressed files are memory-mapped; compressed files are read through nibabel's proxy. Also used for NiBabel images passed directly to the constructor. |
ZarrBackend |
.nii.zarr |
Wraps niizarr.zarr2nii(). Data is stored in independently compressed chunks. Only the chunks overlapping your slice are read. |
NibabelBackend (via store) |
zarr.Store |
For zarr stores passed to the constructor, zarr2nii(store) is called on first access, producing a dask-backed nibabel image. Instantiation is O(1). |
TensorBackend |
In-memory | Used for images created from tensors or NumPy arrays. Wraps a PyTorch tensor directly (no numpy round-trip), preserving its device and dtype. |
For other formats (NRRD, MHA, etc.), there is no lazy backend. Shape
and dtype can still be read from the header via SimpleITK without loading
data, but slicing triggers a full load. You can teach TorchIO about new
formats without modifying Image; see
Extending the backend system.
Practical impact
Slicing a lazy image instead of loading it whole avoids allocating and copying the full tensor. For formats that support random access, it also avoids reading most of the file. Consider reading a small patch from a large volume:
# Full load: reads and allocates the whole volume, then slices
mean_full = tio.ScalarImage("huge_volume.nii").data[:, 100:110, 100:110, 100:110].mean()
# Lazy slice: reads only the requested region
mean_lazy = tio.ScalarImage("huge_volume.nii")[:, 100:110, 100:110, 100:110].data.mean()
How much you gain depends strongly on the format:
.nii(uncompressed) is memory-mapped, so the lazy path reads essentially only the requested bytes. This is true random access and is by far the fastest, often two orders of magnitude quicker for a small patch..nii.zarr(chunked) reads only the chunks overlapping the patch, so it also scales well, especially for remote storage..nii.gz(compressed) is the subtle case: gzip is a stream format, not a random-access one. To reach the requested region, nibabel must decompress the stream from the beginning, so the lazy path still does most of the decompression work. The speedup over a full load is real but modest: it comes mainly from skipping the full float32 allocation and copy, not from avoiding decompression.
At a glance:
| Format | Partial I/O | Notes |
|---|---|---|
.nii |
memory-mapped | true random access, by far the fastest |
.nii.zarr |
chunked | reads only overlapping chunks; great for remote storage |
.nii.gz |
streamed | gzip is not random access; modest speedup |
When tensors are materialized
Laziness applies to I/O, not to computation. The full tensor is read into memory the first time you do any of the following:
- access
.dataor call.load()(the result is then cached); - apply a transform: transforms operate on materialized tensors;
- build a batch or collate subjects in a
DataLoader; - iterate a
Queueor sampler, which materializes each sampled patch (a sampler may still use lazy slicing to read only that patch from disk); - save the image, or call
.numpy().
In other words, lazy reads and slicing speed up getting at a region of the data; they do not turn transforms or batching into deferred operations.
The dataobj property
For advanced use, image.dataobj gives direct access to the backend:
backend = image.dataobj # NibabelBackend, ZarrBackend, or TensorBackend
backend.shape # (C, I, J, K)
backend.affine # 4x4 float64 tensor
patch = backend[:, 50:60, 50:60, 50:60] # torch.Tensor, shape (C, 10, 10, 10)
Backend slicing follows the same rules as image[...]: the result is always a
4D (C, I, J, K) torch.Tensor, and integer indices keep their axis (so
backend[0] has shape (1, I, J, K) rather than dropping the channel
dimension). For TensorBackend, the slice preserves the tensor's device and
dtype.
This is useful when you need fine-grained control over what gets read,
or when you want to avoid even the overhead of creating a new Image
object.
Extending the backend system
Advanced, rarely needed
Most users never touch this. The built-in backends already cover NIfTI, NIfTI-Zarr, zarr stores, NiBabel images, and in-memory tensors. Reach for a custom backend only when you need lazy access to a format TorchIO does not support out of the box.
Backend selection is driven by a registry, so you can support a new format
without editing Image. Register a matcher (which decides whether a
BackendRequest applies) and a factory (which builds the backend).
As a concrete example, here is a lazy backend for plain NumPy .npy volumes.
np.load(..., mmap_mode="r") memory-maps the file, so reading a small region
only touches the bytes you ask for, exactly like the built-in .nii path:
import numpy as np
import torch
from torchio.data import register_backend
from torchio.data.backends import BackendRequest, normalize_index
class NpyBackend:
"""Lazy, memory-mapped backend for single-channel ``.npy`` volumes."""
def __init__(self, path):
self._memmap = np.load(path, mmap_mode="r") # shape (I, J, K), unread
@property
def shape(self):
i, j, k = self._memmap.shape
return (1, i, j, k) # always (C, I, J, K)
@property
def affine(self):
return torch.eye(4, dtype=torch.float64) # unknown here: identity
@property
def dtype(self):
return self._memmap.dtype
def __getitem__(self, index):
# normalize_index keeps the result 4D and never drops axes.
sc, si, sj, sk = normalize_index(index)
region = np.array(self._memmap[si, sj, sk]) # reads only this block
return torch.from_numpy(region)[None][sc] # add channel axis, then select
def to_tensor(self):
return torch.from_numpy(np.array(self._memmap))[None]
register_backend(
"npy",
lambda request: request.path is not None and request.path.suffix == ".npy",
lambda request: NpyBackend(request.path),
)
# .npy files are now first-class and lazy:
image = tio.ScalarImage("volume.npy")
print(image.shape) # read from the header, no full load
patch = image[:, 10:20, 10:20, 10:20] # reads just that block
A matcher can key off anything in the BackendRequest (the path, a zarr store,
the reader, and so on), and registered backends are consulted before the
built-ins, so you can also override a built-in for a given source.
Alternatively, if you already pass a custom reader to a specific image, make
it a lazy reader by implementing create_backend (see
Use a custom reader). Simple readers that just
return (tensor, affine) keep working unchanged: they load eagerly.
File format recommendations
| Use case | Recommended format |
|---|---|
| Local training with random access | Uncompressed .nii (memory-mapped) |
| Storage / archival | .nii.gz (compressed) |
| Very large volumes, remote storage | .nii.zarr (chunked) |
| Large-scale datasets (100k+ volumes) | zarr.Store objects (O(1) instantiation) |
| Interop with non-NIfTI tools | .nrrd, .mha via SimpleITK |