> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Datasets

> Upload datasets for training and evaluation

Datasets contain examples for training and evaluating models. Upload them as JSONL files.

<Tabs>
  <Tab title="SDK" icon="code">
    ## Upload a dataset

    ```python theme={null}
    adaptive.datasets.upload(
        file_path="training-data.jsonl",
        dataset_key="customer-support-v1",
    )
    ```

    | Parameter     | Type | Required | Description                              |
    | ------------- | ---- | -------- | ---------------------------------------- |
    | `file_path`   | str  | Yes      | Path to JSONL file                       |
    | `dataset_key` | str  | Yes      | Unique identifier                        |
    | `name`        | str  | No       | Display name (defaults to `dataset_key`) |

    ## Dataset formats

    Each line in your JSONL file must follow one of these schemas:

    **Prompts and completions** (most common):

    ```json theme={null}
    {"messages": [{"role": "user", "content": "Hello"}], "completion": "Hi there!"}
    ```

    **Prompts only** (for evaluation with generated completions):

    ```json theme={null}
    {"messages": [{"role": "user", "content": "Hello"}]}
    ```

    **With feedback metrics** (for training on ratings):

    ```json theme={null}
    {"messages": [...], "completion": "...", "feedbacks": {"quality": 0.8, "helpful": true}}
    ```

    **With preferences** (for RLHF/DPO training):

    ```json theme={null}
    {"messages": [...], "preferred_completion": "Good answer", "other_completion": "Bad answer", "feedback_key": "quality"}
    ```

    **With images (multimodal)**:

    ```json theme={null}
    {"messages": [{"role": "user", "content": [{"type": "text", "text": "Describe this image"}, {"type": "image", "url": "data:image/jpeg;base64,/9j/4AAQ..."}]}], "completion": "A photo of a cat sitting on a desk."}
    ```

    Use `content` as a list of parts to interleave text and images. Each image must be a base64 data URI (JPEG, PNG, WebP, or GIF, up to 10 MB each). External URLs are not supported.

    <Accordion title="Dataset vs chat image format">
      Datasets and the chat API use different image schemas:

      ```python theme={null}
      # Dataset format
      {"type": "image", "url": "data:image/jpeg;base64,..."}

      # Chat API format (see Models page)
      {"type": "image_url", "image_url": "data:image/jpeg;base64,..."}
      ```
    </Accordion>

    Add optional `labels` or `metadata` fields to any format for filtering or custom graders.

    See [SDK Reference](/v0.14/reference/sdk) for all dataset methods.
  </Tab>

  <Tab title="UI" icon="mouse-pointer">
    ## Upload a dataset

    Navigate to your project and open the **Datasets** tab. Click **Upload Dataset** and select a JSONL file.

    The file must contain one JSON object per line. See the SDK tab for supported formats, including multimodal datasets with images.
  </Tab>
</Tabs>
