Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt

Use this file to discover all available pages before exploring further.

You can import datasets previously uploaded to Adaptive directly within your recipes. Datasets stored on Adaptive can be loaded by specifying a parameter of type Dataset from adaptive_harmony.parameters in your recipe’s InputConfig class. You can then load it in your recipe as a list of StringThread objects by calling await dataset.load(ctx).

Load a dataset

First, define your dataset in your recipe’s input config:
from adaptive_harmony.runtime import InputConfig
from adaptive_harmony.parameters import Dataset

class MyConfig(InputConfig):
    dataset: Dataset
To load a dataset from Adaptive, you can use the load method on the dataset:
async def my_recipe(config: MyConfig, ctx: RecipeContext):
    dataset = await config.dataset.load(ctx)
This utility can also load local files structured in the Adaptive-supported format, which you can leverage if you are testing a recipe locally. Load a local dataset with Dataset(dataset_key="local-file", local_file_path="your_file.jsonl").

StringThread object

The atomic element of any dataset in the adaptive_harmony codebase is a StringThread, which is a Rust backed object exposed in Python. A StringThread simply contains all the messages in a thread of conversation, along with any metadata associated with that thread (such as metric feedback, ground truth labels or any other metadata). StringThread exposes a few helpful methods:
from adaptive_harmony import StringThread

thread = StringThread(
    turns=[
        ("user", "Hello, who are you?"),
        ("assistant", "I am a large language model. How can I help you today?"),
    ]
)

thread_with_metadata = StringThread.with_metadata(
    turns=[
        ("user", "Hello, who are you?"),
        ("assistant", "I am a large language model. How can I help you today?"),
    ],
    metadata={"label": "polite"}
)

# Return all turns of thread as NamedTuples
all_turns = thread.get_turns()
all_turns[0][0] == all_turns[0].role
# Returns a list of tuples, each containing a role and a message
# Does not include the completion, if there is one (last turn with `assistant` role)
messages = thread.messages() 
assert messages[-1][0] != "assistant"
# Returns the string content of the last message in the thread if its role is `assistant`
# Otherwise, returns None
completion = thread.completion()


new_thread = StringThread([])
new_thread = new_thread.system("You are a helpful bot.")
thread = thread.user("My name is John.") # Returns a new thread with a new user message added to the end
thread = thread.assistant("Nice to meet you, John!") # Returns a new thread with a new assistant message added to the end

Loading from Hugging Face

You can also load datasets directly from Hugging Face in your recipe. adaptive_harmony exposes helper methods to convert arbitrary datasets into a list of StringThread objects by allowing you to specify the column in the original dataset that contains chat messages.
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict

def load_hf_dataset():
    # Helper function to convert HF dataset to Adaptive StringThread
    convert_sample_fn = convert_sample_dict(
        turns_key="messages", 
        role_key="role", 
        content_key="content"
    )
    
    # Load the dataset
    dataset = load_from_hf(
        "HuggingFaceH4/ultrachat_200k", 
        "train_sft", 
        convert_sample_fn
    )
    
    return dataset