> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Load datasets and StringThread

> How to load Adaptive datasets in your recipes

You can import datasets previously [uploaded to Adaptive](/v0.14/core/datasets) directly within your recipes.
Datasets stored on Adaptive can be loaded by specifying a parameter of type `Dataset` from `adaptive_harmony.parameters` in your recipe's `InputConfig` [class](/v0.14/harmony/config).

You can then load it in your recipe as a list of `StringThread` objects by calling `await dataset.load(ctx)`.

## Load a dataset

First, define your dataset in your recipe's input config:

```python theme={null}
from adaptive_harmony.runtime import InputConfig
from adaptive_harmony.parameters import Dataset

class MyConfig(InputConfig):
    dataset: Dataset
```

To load a dataset from Adaptive, you can use the `load` method on the dataset:

```python theme={null}
async def my_recipe(config: MyConfig, ctx: RecipeContext):
    dataset = await config.dataset.load(ctx)
```

<Info>
  This utility can also load local files structured in the Adaptive-supported [format](/v0.14/core/datasets), which you can leverage if you are testing a recipe [locally](/v0.14/harmony/harmony-client). Load a local dataset with `Dataset(dataset_key="local-file", local_file_path="your_file.jsonl")`.
</Info>

## StringThread object

The atomic element of any dataset in the `adaptive_harmony` codebase is a `StringThread`, which is a Rust backed object exposed in Python. A `StringThread` simply contains all the messages in a thread of conversation, along with any metadata associated with that thread (such as metric feedback, ground truth labels or any other metadata). `StringThread` exposes a few helpful methods:

```python theme={null}
from adaptive_harmony import StringThread

thread = StringThread(
    turns=[
        ("user", "Hello, who are you?"),
        ("assistant", "I am a large language model. How can I help you today?"),
    ]
)

thread_with_metadata = StringThread.with_metadata(
    turns=[
        ("user", "Hello, who are you?"),
        ("assistant", "I am a large language model. How can I help you today?"),
    ],
    metadata={"label": "polite"}
)

# Return all turns of thread as NamedTuples
all_turns = thread.get_turns()
all_turns[0][0] == all_turns[0].role
# Returns a list of tuples, each containing a role and a message
# Does not include the completion, if there is one (last turn with `assistant` role)
messages = thread.messages() 
assert messages[-1][0] != "assistant"
# Returns the string content of the last message in the thread if its role is `assistant`
# Otherwise, returns None
completion = thread.completion()


new_thread = StringThread([])
new_thread = new_thread.system("You are a helpful bot.")
thread = thread.user("My name is John.") # Returns a new thread with a new user message added to the end
thread = thread.assistant("Nice to meet you, John!") # Returns a new thread with a new assistant message added to the end
```

## Loading from Hugging Face

You can also load datasets directly from Hugging Face in your recipe. `adaptive_harmony` exposes helper methods to convert arbitrary datasets into a list of `StringThread` objects by allowing you to specify the column in the original dataset that contains chat messages.

```python theme={null}
from adaptive_harmony.core.dataset import load_from_hf, convert_sample_dict

def load_hf_dataset():
    # Helper function to convert HF dataset to Adaptive StringThread
    convert_sample_fn = convert_sample_dict(
        turns_key="messages", 
        role_key="role", 
        content_key="content"
    )
    
    # Load the dataset
    dataset = load_from_hf(
        "HuggingFaceH4/ultrachat_200k", 
        "train_sft", 
        convert_sample_fn
    )
    
    return dataset
```
