Upload datasets
Upload datasets to Adaptive Engine for training and evaluation
You can easily upload datasets to Adaptive Engine via the Datasets UI or the SDK, and use them for training and evaluation. Datasets can be comprised of prompts only, with optional completions. They can also include metric or preference feedback, allowing you to bring previously captured feedback signals of any kind to Adaptive.
Datasets must be uploaded as JSON Lines files (.jsonl
). See the SDK Reference to learn how to upload datasets via the SDK.
Below you will find the supported dataset types and schemas.
Dataset types
Prompts only
Datasets with prompts only allow you to both train and evaluate models with AI feedback.
If used for evaluation, Adaptive will run batch inference on the whole dataset with the evaluated models before judging the new completions.
Each line in the input jsonl
file must have the following schema:
You can additionally add a labels
array with key-value pairs to label all the interactions in the dataset.
A valid jsonl
file with 2 samples would look as such:
Prompts and completions
Datasets with prompts and completions enable all of the above, but also allow you to evaluate the uploaded completions. Nevertheless, you can choose to evaluate the uploaded completions, or replay the prompts only. In the latter case, Adaptive ignores the uploaded completions, instead running batch inference with the evaluated models on the whole dataset before judging the new completions.
Each line in the input jsonl
file must have the following schema:
You can additionally add a labels
array with key-value pairs to label all the interactions in the dataset.
A valid jsonl
file with 2 samples would look as such:
Prompts and completions with metrics
Datasets with prompts, completions and metric feedback enable all of the above, but also allow you to train models on the uploaded metrics.
Each key-value pair in feedbacks
represents feedback_key: feedback_value
.
Adaptive Engine registers the new feedback keys in your dataset if they haven’t been logged before, configuring them according to their data type.
You can additionally add a labels
array with key-value pairs to label all the interactions in the dataset.
A valid jsonl
file with 2 samples would look as such:
Prompts and preferences between two completions
Datasets with prompts and preferences between two completions enable all of the above, but also allow you to train models on the uploaded preference sets. Adaptive Engine registers the new feedback key in your dataset if it hasn’t been logged before.
You can additionally add a labels
array with key-value pairs to label all the interactions in the dataset.
A valid jsonl
file with 2 samples would look as such:
Example: Upload HuggingFace dataset to Adaptive
In this example, we download a customer support dataset from the HuggingFace Datasets Hub, save it as a file with the appropriate schema as detailed above, and upload it to Adaptive Engine via the SDK.
It’s that easy!