Change log#
The change log file hosting all releases with lists of new features and breaking changes. Best viewed here.
Unreleased#
New features
Add
reseed_each_epoch
option toMapDataset.repeat
that allows to replay the first epoch exactly if set to False (True by default).Introduces
grain.experimental.RebatchIterDataset
for efficient rebatch.Migrates data loader to use dataset API under the hood.
Breaking changes:
SliceMapDataset updated to use the full index relative to the parent dataset, instead index%len(self).
Deprecations:
Graduate
grain.experimental.apply_transformations
tograin.{MapDataset|IterDataset}.apply
. The experimental API will soon be deprecated.
Bug fixes
Grain 0.2.12 (August 21, 2025)#
New features:
Adds Windows build.
Allow passing
read_kwargs
toParquetIterDataset
for configuring parquet file reading.ThreadPrefetchDatasetIterator
now supports non-Grain iterators that support checkpointing.Introduces API for device prefetch -
grain.experimental.device_put()
for easy CPU and device prefetching.Introduces API for autotuning – given the user provided RAM restrictions and specific
IterDataset
, finds number of processes formp_prefetch
and buffer size forPrefetchDatasetIterator
.Allow passing
reader_options
toArrayRecordDataSource
for configuring array record file reading.Introduces
grain.experimental.batch_and_pad
for padding a partial batch to avoid dropping batch remainder data.Grain interleave optimization - allow creating more threads to parallelly keep starting iterators and prefetching elements.
Allow for alternative slicing of the data for
MultiprocessPrefetchIterDataset
. New slicing allows each worker process to read unique file shards and thus improving performance.
Breaking changes:
Upgrades
array_record
andprotobuf
.
Deprecations:
Bug fixes
Grain 0.2.11 (July 2, 2025)#
New features:
Automatic publishing releases to PyPI via GitHub actions.
Nightly builds.
Introduced changelog.