You can easily upload datasets to Adaptive Engine via the Datasets UI or the SDK, and use them for training and evaluation. Datasets can be comprised of prompts only, with optional completions. They can also include metric or preference feedback, allowing you to bring previously captured feedback signals of any kind to Adaptive.

Datasets must be uploaded as JSON Lines files (.jsonl). See the SDK Reference to learn how to upload datasets via the SDK.

Below you will find the supported dataset types and schemas.

Dataset types

Prompts only

Datasets with prompts only allow you to both train and evaluate models with AI feedback. If used for evaluation, Adaptive will run batch inference on the whole dataset with the evaluated models before judging the new completions. Each line in the input jsonl file must have the following schema:

{
  "messages": [
    {"role": "system", "content": "<input prompt>"},
    {"role": "user", "content": "<user input>"}
  ]
}

You can additionally add a labels array with key-value pairs to label all the interactions in the dataset.

A valid jsonl file with 2 samples would look as such:

{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 1>"}]}
{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 2>"}]}

Prompts and completions

Datasets with prompts and completions enable all of the above, but also allow you to evaluate the uploaded completions. Nevertheless, you can choose to evaluate the uploaded completions, or replay the prompts only. In the latter case, Adaptive ignores the uploaded completions, instead running batch inference with the evaluated models on the whole dataset before judging the new completions.

Each line in the input jsonl file must have the following schema:

{
  "messages": [
    {"role": "system", "content": "<input prompt>"},
    {"role": "user", "content": "<user input>"}
  ],
  "completion": "<completion>"
}

You can additionally add a labels array with key-value pairs to label all the interactions in the dataset.

A valid jsonl file with 2 samples would look as such:

{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 1>"}], "completion": "<completion 1>"}
{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 2>"}], "completion": "<completion 2>"}

Prompts and completions with metrics

Datasets with prompts, completions and metric feedback enable all of the above, but also allow you to train models on the uploaded metrics. Each key-value pair in feedbacks represents feedback_key: feedback_value. Adaptive Engine registers the new feedback keys in your dataset if they haven’t been logged before, configuring them according to their data type.

{
  "messages": [
    {"role": "system", "content": "<input prompt>"},
    {"role": "user", "content": "<user input>"}
  ],
  "completion": "<completion>",
  "feedbacks": {
      "foo": true,
      "bar": 0.5,
      "baz": 100
  }
}

You can additionally add a labels array with key-value pairs to label all the interactions in the dataset.

A valid jsonl file with 2 samples would look as such:

{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 1>"}], "completion": "<completion 1>",  "feedbacks": {"foo": true, "bar": 0.5, "baz": 100}}
{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 2>"}], "completion": "<completion 2>",  "feedbacks": {"foo": false, "bar": 0.7, "baz": 50}}

Prompts and preferences between two completions

Datasets with prompts and preferences between two completions enable all of the above, but also allow you to train models on the uploaded preference sets. Adaptive Engine registers the new feedback key in your dataset if it hasn’t been logged before.

{
  "messages": [
    {"role": "system", "content": "<input prompt>"},
    {"role": "user", "content": "<user input>"}
  ],
  "preferred_completion": "<preferred_completion>",
  "other_completion": "<other_completion>",
  "feedback_key": "<feedback_key>"
}

You can additionally add a labels array with key-value pairs to label all the interactions in the dataset.

A valid jsonl file with 2 samples would look as such:

{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 1>"}], "preferred_completion": "<preferred_completion 1>", "other_completion": "<other_completion 1>", "feedback_key": "<feedback_key>"}
{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 2>"}], "preferred_completion": "<preferred_completion 2>", "other_completion": "<other_completion 2>", "feedback_key": "<feedback_key>"}

Example: Upload HuggingFace dataset to Adaptive

In this example, we download a customer support dataset from the HuggingFace Datasets Hub, save it as a file with the appropriate schema as detailed above, and upload it to Adaptive Engine via the SDK.

import datasets
import json
from adaptive_sdk import Adaptive

new_use_case = "customer_support"
adaptive_client = Adaptive(base_url="ADAPTIVE_URL", api_key="ADAPTIVE_API_KEY")
adaptive_client.use_cases.create(key=new_use_case)
adaptive_client.set_default_use_case(use_case=new_use_case)

dataset = datasets.load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset")["train"]

new_file_name = "customer-support-data.jsonl"
adaptive_data = []
system_prompt = "You are a good customer support bot."
for sample in dataset:
    adaptive_data.append(
        {
            "messages":[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": sample["instruction"]},
            ],
            "completion": sample["response"]
        }
    )

with open(new_file_name, "w") as f:
    for new_sample in adaptive_data:
        f.write(json.dumps(new_sample, ensure_ascii=False) + '\n')
    
adaptive_client.datasets.upload(
    file_path=new_file_name,
    dataset_key=new_file_name.replace(".jsonl","")
)

It’s that easy!