Overview
In custom recipes, you can load datasets using the AdaptiveDataset
class, which provides a convenient way to access dataset files and work with them in your training or inference tasks.
Dataset loading is done through the config feature.
Step 1: Define Dataset in Your Config
First, define your dataset in your recipe’s input configuration:
from typing import Annotated
from adaptive_harmony import InputConfig, AdaptiveDataset
class MyConfig(InputConfig):
dataset: AdaptiveDataset
Step 2: Access the Dataset File
In your recipe function, you can access the dataset file directly:
def my_recipe(config: MyConfig):
# Get the dataset file path
dataset_file = config.dataset.file
# Use the dataset file
with open(dataset_file, "r") as file:
# Read and process your dataset
data = file.read()
# Your processing logic here
Loading from Hugging Face
You can also load datasets directly from Hugging Face:
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict
def load_hf_dataset():
# Helper function to convert HF dataset to Adaptive StringThread
convert_sample_fn = convert_sample_dict(
turns_key="messages",
role_key="role",
content_key="content"
)
# Load the dataset
dataset = load_from_hf(
"HuggingFaceH4/ultrachat_200k",
"train_sft",
convert_sample_fn
)
return dataset