async_map
Process all items concurrently with automatic progress tracking:
async_map_fallible
Same as async_map but silently skips failures instead of crashing.
Failed samples are excluded from results
batch_process_fallible
Process items in controlled batches with failure handling:list[tuple[int, T]] where the int is the original index in the input data. Use this to track which samples succeeded.
When to use:
- Large datasets where running all at once would exceed memory
- Want to report progress in batches
- Need to preserve sample indices
async_map_batch
Process an iterator in batches with automatic retry on failure:f- Async function to applydata- Iterator (not list) of itemsbatch_size- Number of items per batchmax_failure_fraction- Max fraction of failures before raising exception (default 0.5)
- Processes
batch_sizeitems concurrently - If a sample fails, pulls next item from iterator and retries
- Fails if more than
max_failure_fraction * batch_sizesamples fail - Results are not ordered
- Working with iterators
- Want automatic retry with fresh samples on failure (as in training, where batch size must remain constant)
- Don’t need to preserve ordering

