Upload datasets

You can easily upload datasets to Adaptive Engine via the Datasets UI or the SDK, and use them for training and evaluation. Datasets can be comprised of prompts only, with optional completions. They can also include metric or preference feedback, allowing you to bring previously captured feedback signals of any kind to Adaptive.

Datasets must be uploaded as JSON Lines files (.jsonl). See the SDK Reference to learn how to upload datasets via the SDK.

Below you will find the supported dataset types and schemas.

Dataset types

Prompts only

Datasets with prompts only allow you to both train and evaluate models with AI feedback. If used for evaluation, Adaptive will run batch inference on the whole dataset with the evaluated models before judging the new completions. Each line in the input jsonl file must have the following schema:

{
  "messages": [
    {"role": "system", "content": "<input prompt>"},
    {"role": "user", "content": "<user input>"}
  ]
}

You can additionally add a labels array with key-value pairs to label all the interactions in the dataset.

A valid jsonl file with 2 samples would look as such:

{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 1>"}]}
{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 2>"}]}

Prompts and completions

Datasets with prompts and completions enable all of the above, but also allow you to evaluate the uploaded completions. Nevertheless, you can choose to evaluate the uploaded completions, or replay the prompts only. In the latter case, Adaptive ignores the uploaded completions, instead running batch inference with the evaluated models on the whole dataset before judging the new completions.

Each line in the input jsonl file must have the following schema:

{
  "messages": [
    {"role": "system", "content": "<input prompt>"},
    {"role": "user", "content": "<user input>"}
  ],
  "completion": "<completion>"
}

You can additionally add a labels array with key-value pairs to label all the interactions in the dataset.

A valid jsonl file with 2 samples would look as such:

{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 1>"}], "completion": "<completion 1>"}
{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 2>"}], "completion": "<completion 2>"}

Prompts and completions with metrics

Datasets with prompts, completions and metric feedback enable all of the above, but also allow you to train models on the uploaded metrics. Each key-value pair in feedbacks represents feedback_key: feedback_value. Adaptive Engine registers the new feedback keys in your dataset if they haven’t been logged before, configuring them according to their data type.

{
  "messages": [
    {"role": "system", "content": "<input prompt>"},
    {"role": "user", "content": "<user input>"}
  ],
  "completion": "<completion>",
  "feedbacks": {
      "foo": true,
      "bar": 0.5,
      "baz": 100
  }
}

You can additionally add a labels array with key-value pairs to label all the interactions in the dataset.

A valid jsonl file with 2 samples would look as such:

{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 1>"}], "completion": "<completion 1>",  "feedbacks": {"foo": true, "bar": 0.5, "baz": 100}}
{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 2>"}], "completion": "<completion 2>",  "feedbacks": {"foo": false, "bar": 0.7, "baz": 50}}

Prompts and preferences between two completions

Datasets with prompts and preferences between two completions enable all of the above, but also allow you to train models on the uploaded preference sets. Adaptive Engine registers the new feedback key in your dataset if it hasn’t been logged before.

{
  "messages": [
    {"role": "system", "content": "<input prompt>"},
    {"role": "user", "content": "<user input>"}
  ],
  "preferred_completion": "<preferred_completion>",
  "other_completion": "<other_completion>",
  "feedback_key": "<feedback_key>"
}

You can additionally add a labels array with key-value pairs to label all the interactions in the dataset.

A valid jsonl file with 2 samples would look as such:

{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 1>"}], "preferred_completion": "<preferred_completion 1>", "other_completion": "<other_completion 1>", "feedback_key": "<feedback_key>"}
{"messages": [{"role": "system", "content": "<input prompt>"},{"role": "user", "content": "<user input 2>"}], "preferred_completion": "<preferred_completion 2>", "other_completion": "<other_completion 2>", "feedback_key": "<feedback_key>"}

Adding metadata to records

You can add a metadata object to the records of any of the dataset types above. Metadata is particularly useful when you train on a reward from an external feedback endpoint; your server implementation can use each sample’s metadata to compute its reward.

The following is an example of a record with a correct_result of a ground truth SQL query in its metadata. With this data, you can execute the generated SQL query against a real database in reward server, and maximally reward the model if the result matches the correct result.

{
  "messages": [
    {"role": "system", "content": "You are a nice data analyst that writes accurate SQL queries. Here is the schema for the DB's you have available: [...DB SCHEMAS...]"},
    {"role": "user", "content": "How many users do we have?"}
  ],
  "completion": "SELECT COUNT(*) FROM users;"
  "metadata": {
    "correct_result":"{(1000,)}"
  }
}

Example: Upload HuggingFace dataset to Adaptive

In this example, we download a customer support dataset from the HuggingFace Datasets Hub, save it as a file with the appropriate schema as detailed above, and upload it to Adaptive Engine via the SDK.

import datasets
import json
from adaptive_sdk import Adaptive

new_use_case = "customer_support"
adaptive = Adaptive(base_url="ADAPTIVE_URL", api_key="ADAPTIVE_API_KEY")
adaptive.use_cases.create(key=new_use_case)
adaptive.set_default_use_case(use_case=new_use_case)

dataset = datasets.load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset")["train"]

new_file_name = "customer-support-data.jsonl"
adaptive_data = []
system_prompt = "You are a good customer support bot."
for sample in dataset:
    adaptive_data.append(
        {
            "messages":[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": sample["instruction"]},
            ],
            "completion": sample["response"]
        }
    )

with open(new_file_name, "w") as f:
    for new_sample in adaptive_data:
        f.write(json.dumps(new_sample, ensure_ascii=False) + '\n')
    
adaptive.datasets.upload(
    file_path=new_file_name,
    dataset_key=new_file_name.replace(".jsonl","")
)

It’s that easy!

.

Platform

Inference and Feedback

Datasets

Evaluation

Fine-tuning

Custom Recipes

Integrations

Deployment

Dataset types

Prompts only

Prompts and completions

Prompts and completions with metrics

Prompts and preferences between two completions

Adding metadata to records

Example: Upload HuggingFace dataset to Adaptive

.

Platform

Inference and Feedback

Datasets

Evaluation

Fine-tuning

Custom Recipes

Integrations

Deployment

​Dataset types

​Prompts only

​Prompts and completions

​Prompts and completions with metrics

​Prompts and preferences between two completions

​Adding metadata to records

​Example: Upload HuggingFace dataset to Adaptive

Dataset types

Prompts only

Prompts and completions

Prompts and completions with metrics

Prompts and preferences between two completions

Adding metadata to records

Example: Upload HuggingFace dataset to Adaptive