Documentation Index
Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
Use this file to discover all available pages before exploring further.
You can import datasets previously uploaded to Adaptive directly within your recipes.
Datasets stored on Adaptive can be loaded by specifying a parameter of type Dataset from adaptive_harmony.parameters in your recipe’s InputConfig class.
You can then load it in your recipe as a list of StringThread objects by calling await dataset.load(ctx).
Load a dataset
First, define your dataset in your recipe’s input config:
from adaptive_harmony.runtime import InputConfig
from adaptive_harmony.parameters import Dataset
class MyConfig(InputConfig):
dataset: Dataset
To load a dataset from Adaptive, you can use the load method on the dataset:
async def my_recipe(config: MyConfig, ctx: RecipeContext):
dataset = await config.dataset.load(ctx)
This utility can also load local files structured in the Adaptive-supported format, which you can leverage if you are testing a recipe locally. Load a local dataset with Dataset(dataset_key="local-file", local_file_path="your_file.jsonl").
StringThread object
The atomic element of any dataset in the adaptive_harmony codebase is a StringThread, which is a Rust backed object exposed in Python. A StringThread simply contains all the messages in a thread of conversation, along with any metadata associated with that thread (such as metric feedback, ground truth labels or any other metadata). StringThread exposes a few helpful methods:
from adaptive_harmony import StringThread
thread = StringThread(
turns=[
("user", "Hello, who are you?"),
("assistant", "I am a large language model. How can I help you today?"),
]
)
thread_with_metadata = StringThread.with_metadata(
turns=[
("user", "Hello, who are you?"),
("assistant", "I am a large language model. How can I help you today?"),
],
metadata={"label": "polite"}
)
# Return all turns of thread as NamedTuples
all_turns = thread.get_turns()
all_turns[0][0] == all_turns[0].role
# Returns a list of tuples, each containing a role and a message
# Does not include the completion, if there is one (last turn with `assistant` role)
messages = thread.messages()
assert messages[-1][0] != "assistant"
# Returns the string content of the last message in the thread if its role is `assistant`
# Otherwise, returns None
completion = thread.completion()
new_thread = StringThread([])
new_thread = new_thread.system("You are a helpful bot.")
thread = thread.user("My name is John.") # Returns a new thread with a new user message added to the end
thread = thread.assistant("Nice to meet you, John!") # Returns a new thread with a new assistant message added to the end
Loading from Hugging Face
You can also load datasets directly from Hugging Face in your recipe. adaptive_harmony exposes helper methods to convert arbitrary datasets into a list of StringThread objects by allowing you to specify the column in the original dataset that contains chat messages.
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict
def load_hf_dataset():
# Helper function to convert HF dataset to Adaptive StringThread
convert_sample_fn = convert_sample_dict(
turns_key="messages",
role_key="role",
content_key="content"
)
# Load the dataset
dataset = load_from_hf(
"HuggingFaceH4/ultrachat_200k",
"train_sft",
convert_sample_fn
)
return dataset