.jsonl
). See the SDK Reference to learn how to upload datasets via the SDK.
Below you will find the supported dataset types and schemas.
Dataset types
Prompts only
Datasets with prompts only allow you to both train and evaluate models with AI feedback. If used for evaluation, Adaptive will run batch inference on the whole dataset with the evaluated models before judging the new completions. Each line in the inputjsonl
file must have the following schema:
labels
array with key-value pairs to label all the interactions in the dataset.
A valid jsonl
file with 2 samples would look as such:
Prompts and completions
Datasets with prompts and completions enable all of the above, but also allow you to evaluate the uploaded completions. Nevertheless, you can choose to evaluate the uploaded completions, or replay the prompts only. In the latter case, Adaptive ignores the uploaded completions, instead running batch inference with the evaluated models on the whole dataset before judging the new completions. Each line in the inputjsonl
file must have the following schema:
labels
array with key-value pairs to label all the interactions in the dataset.
A valid jsonl
file with 2 samples would look as such:
Prompts and completions with metrics
Datasets with prompts, completions and metric feedback enable all of the above, but also allow you to train models on the uploaded metrics. Each key-value pair infeedbacks
represents feedback_key: feedback_value
.
Adaptive Engine registers the new feedback keys in your dataset if they haven’t been logged before, configuring them according to their data type.
labels
array with key-value pairs to label all the interactions in the dataset.
A valid jsonl
file with 2 samples would look as such:
Prompts and preferences between two completions
Datasets with prompts and preferences between two completions enable all of the above, but also allow you to train models on the uploaded preference sets. Adaptive Engine registers the new feedback key in your dataset if it hasn’t been logged before.labels
array with key-value pairs to label all the interactions in the dataset.
A valid jsonl
file with 2 samples would look as such:
Adding metadata to records
You can add ametadata
object to the records of any of the dataset types above.
Metadata is particularly useful when you train on a reward from an external feedback endpoint;
your server implementation can use each sample’s metadata to compute its reward.
The following is an example of a record with a correct_result
of a ground truth SQL query in its metadata.
With this data, you can execute the generated SQL query against a real database in reward server, and maximally reward the model if the
result matches the correct result.