Adaptive Engine allows you to create your own reward function implementation to train and evaluate models. You can expose this function as an external server, which Adaptive must be able to access via HTTP/HTTPS.

Although reward servers support any arbitrary reward, they are especially useful when your reward depends on external systems like databases, simulated environments, sandboxes, or APIs, where feedback from execution or interaction is available.

A reward server used for both training and evaluation on Adaptive Engine.

The scale of absolute values for rewards can be arbitrary with no need for normalization, the only requirement is consistency.

During a training or evaluation job, batches of samples (prompt messages + completion) are sent to the reward server for scoring. The job waits for the server to respond, retrying with exponential backoff if any network disruptions are detected.

A standard API specification and Python SDK tooling help you easily build and deploy a reward server.

Building a Reward Server

Implementation guide

The Adaptive SDK includes a RewardServer base class built on top of FastAPI that handles all the server-side functionality required to deploy a reward server.

RewardServers allow you to optionally define an input schema that enforces the existence of metadata required for reward scoring. For example, a ground truth computed_result would be needed if we were training a model to solve math problems. When a training job is launched, Adaptive Engine polls your reward server for its required metadata schema, and validates that every record in the selected dataset includes this metadata, cancelling the job launch if they don’t. This allows you to decouple the reward scoring logic from the job that you’re running, and guarantee data validation by design.

To create your own reward server, you need to:

  1. Define a metadata class (or use the EmptyMetadata class if your reward function doesn’t require metadata)
  2. Implement a subclass of RewardServer with:
    • An implementation of the score method, which computes the reward for a single sample
    • An implementation of the info method, which provides server metadata back to Adaptive Engine

Let’s take a look at the function signatures for the score and info methods:

from typing import Any, Generic
from pydantic import BaseModel
from adaptive_sdk.external import ValidatedRequest, Response

class Turn(BaseModel):
    role: str
    content: str

class ValidatedRequest(BaseModel, Generic[MetadataModel]):
    turns: list[Turn]
    metadata: MetadataModel

class Response(BaseModel):
    reward: float
    metadata: dict[str, Any]

class ServerInfo(BaseModel):
    version: str
    name: str
    description: str

async def score(self, request: ValidatedRequest[MetadataModel]) -> Response:
    # your reward scoring implementation goes here
    computed_reward = 0.0
    return Response(
        reward=computed_reward,
        metadata={}
    )

async def info(self) -> ServerInfo:
    return ServerInfo(
        version="1.0",
        name="My Reward Server",
        description="A simple reward server that gives a reward based on the model's completion"
    )

Here’s what we can derive from the above:

  • score takes a ValidatedRequest object, which represents one of the records in your dataset. turns is the full conversation history, and metadata is the metadata object you defined.
  • score returns a Response object, which contains the reward attributed to the record and an optional metadata dictionary, which you can use to return any information you want from the server, including detailed feedback or error messages.
  • info returns a ServerInfo object, which contains the server version, name and description. Adaptive Engine uses this information to register your server and display it in the UI.

That’s it! We can now get into some real examples.

Example servers

Hello World

Let’s create a toy reward server that rewards samples based on whether a “scary letter” appears in the model’s completions, which is specific to each sample.

scary_letter_reward_server.py
from adaptive_sdk.external import RewardServer, ValidatedRequest, Response, ServerInfo
from pydantic import BaseModel, Field

# Define the metadata class
class ScaryLetterMetadata(BaseModel):
    scary_letter: str = Field(min_length=1, max_length=1)

# Implement the reward server
class ScaryLetterRewardServer(RewardServer[ScaryLetterMetadata]):
    def __init__(self, port: int = 8000, blocking: bool = True, **kwargs):
        super().__init__(port, ScaryLetterMetadata, blocking, **kwargs)

    async def score(self, request: ValidatedRequest[ScaryLetterMetadata]) -> Response:
        # Get the last turn from the conversation, corresponding to
        # the model's completions
        last_completion = request.turns[-1].content

        # Count occurrences of the scary letter
        num_scary_letters = last_completion.count(request.metadata.scary_letter)
        
        # Return a reward of 1.0 if no scary letters are found, 0.0 otherwise
        return Response(
            reward=0.0 if request.metadata.scary_letter in last_completion else 1.0,
            metadata={
                "feedback": (
                    "There were no scary letters!"
                    if num_scary_letters == 0
                    else f"There were {num_scary_letters} scary letters!"
                )
            },
        )

    async def info(self) -> ServerInfo:
        return ServerInfo(
            version="1.0", 
            name="Scary Letter Detector", 
            description="Rewards completions that avoid a specific letter."
        )

# Start the server
if __name__ == "__main__":
    server = ScaryLetterRewardServer(port=50056)

To run this server:

python scary-letter-reward-server.py

Hello World (no metadata)

If your reward function doesn’t need any additional metadata, you can use the EmptyMetadata class:

no_metadata_reward_server.py
from adaptive_sdk.external import RewardServer, EmptyMetadata, ValidatedRequest, Response, ServerInfo

class LengthRewardServer(RewardServer[EmptyMetadata]):
    def __init__(self, port: int = 8000, blocking: bool = True, **kwargs):
        super().__init__(port, EmptyMetadata, blocking, **kwargs)

    async def score(self, request: ValidatedRequest[EmptyMetadata]) -> Response:
        # Simply reward long completions, up to 100 characters
        completion = request.turns[-1].content
        return Response(
            reward=min(len(completion) / 100, 1.0),
            metadata={}
        )

    async def info(self) -> ServerInfo:
        return ServerInfo(
            version="1.0", 
            name="Length Rewarder", 
            description="Rewards longer responses (up to 100 characters)"
        )

if __name__ == "__main__":
    server = LengthRewardServer(port=50056)

SQL Query Execution

This more realistic example creates a reward server for a Text-to-SQL task. The reward server:

  1. Takes the correct ground truth result for the prompt as metadata
  2. Executes the model’s generated SQL query against a SQLite database
  3. Compares the results and assigns a reward of:
  • 1.0 if the results match
  • 0.0 if they don’t
  • -1.0 if the query is invalid, throws an execution error, or the model’s response is not a valid SQL query
sql_reward_server.py
import sqlite3
import os
import pandas as pd
from adaptive_sdk.external import RewardServer, ValidatedRequest, Response, ServerInfo
from pydantic import BaseModel, Field
from typing import List, Dict, Any

class SQLMetadata(BaseModel):
    ground_truth_results: List[Dict[str, Any]] = Field(
        description="The expected query results as a list of dictionaries"
    )
    db_path: str = Field(
        description="Path to a SQLite database file"
    )

class SQLRewardServer(RewardServer[SQLMetadata]):
    def __init__(self, db_base_path:str, port=8000, blocking=True, **kwargs):
        self.db_base_path = db_base_path
        super().__init__(port, SQLMetadata, blocking, **kwargs)
    
    async def score(self, request: ValidatedRequest[SQLMetadata]) -> Response:
        # Extract the SQL query from the model's completion
        sql_query = request.turns[-1].content
        
        # Representative, we don't check for the exact syntax, only that it starts with SELECT
        if not (sql_query.startswith("SELECT") or sql_query.startswith("select")):
            return Response(
                reward=-1.0,
                metadata={"status": "invalid_query"}
            )
        
        # Connect to the database
        try:
            conn = sqlite3.connect(
                os.path.join(self.db_base_path, request.metadata.db_path)
            )
            
            # Execute the query
            df_actual = pd.read_sql_query(sql_query, conn)
            actual_results = df_actual.to_dict(orient='records')
            
            # Compare with expected results
            success_or_fail = self._results_match(actual_results, request.metadata.ground_truth_results)
            return Response(
                reward=float(success_or_fail),
                metadata={
                    "status": "success" if success_or_fail else "wrong_result", 
                    "actual_results": actual_results,
                    "ground_truth_results": request.metadata.ground_truth_results
                }
            )
                
        except Exception as e:
            # Query caused an error
            return Response(
                reward=-1.0,
                metadata={
                    "status": "error", 
                    "message": f"Query execution error: {str(e)}"
                }
            )
        finally:
            if 'conn' in locals():
                conn.close()
    
    def _results_match(self, actual_results, expected_results):
        """Compare query results, handle potential ordering differences"""
        if len(actual_results) != len(expected_results):
            return False
            
        # Convert to sets of frozensets for order-independent comparison
        actual_set = set(frozenset(d.items()) for d in actual_results)
        expected_set = set(frozenset(d.items()) for d in expected_results)
        
        return actual_set == expected_set
    
    async def info(self) -> ServerInfo:
        return ServerInfo(
            version="1.0", 
            name="SQL Query Execution Success", 
            description="Evaluates SQL queries by comparing execution results with ground truth"
        )

if __name__ == "__main__":
    server = SQLRewardServer(db_base_path="/path/to/dbs/", port=50056)

Test your reward server locally

You can use the SDK’s RewardClient to test your reward server locally before deploying it for production use in Adaptive Engine. This allows you to easily build test cases and verify that your server returns the correct rewards for different inputs.

As an example, after running the scary-letter-reward-server.py example server above, you can test it with the following client:

local_reward_server_test.py
from adaptive_sdk.external import RewardClient, Request, Turn

reward_client = RewardClient(url="http://localhost:50056")
response = reward_client.score(
    Request(
        turns=[
            Turn(role="system", content="You are a helpful assistant that answers questions about geography."),
            Turn(role="user", content="What is the name of the sea that touches Stockholm?"),
            Turn(role="assistant", content="Baltic Sea."),
        ],
        metadata={
            "scary_letter": "d",
        },
    )
)

assert response.reward == 1.0

You can also use the batch_score method to score multiple requests at once.

Deploy a reward server

Your reward server needs to be accessible to Adaptive Engine over HTTP or HTTPS. You have several deployment options:

  1. Local development/testing: For initial development and testing, you can use a tool like ngrok to expose your local server and test integration and access.
  2. Bare metal deployment: Deploy and run your server on bare metal
  3. Containerization: Package your server as a Docker container and deploy to a cloud container service or orchestrator like Kubernetes

The easiest setup for production use is to containerize your server and colocate it with your Adaptive Engine deployment. Whether you deploy Adaptive Engine on a single node with Docker Compose or on Kubernetes, this approach will facilitate server discovery and access.

Let’s take the example of our SQL Query Execution reward server. The directory structure of the project is as follows (note you can choose to include the SQLite databases in the container, or mount a volume from a host machine/NFS share):

sql-query-execution-reward-server/
├── __init__.py
├── db_data/
    ├── geography.sqlite
    └── weather.sqlite
├── Dockerfile
├── requirements.txt
├── sql_reward_server.py

Your requirements.txt file should include the following dependencies:

requirements.txt
adaptive-sdk
pandas
pydantic

And a simple Dockerfile for the reward server container could look like the following:

Dockerfile
FROM python:3.10-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "sql_reward_server.py"]

You can then build the container:

docker build -t sql-reward-server:1.0 .

Docker Compose

If deploying Adaptive Engine on a single node with Docker Compose, you can add the following to the docker-compose.yml file:

docker-compose.yml
[... other services ...]
  sql-reward-server:
    image: sql-reward-server:1.0
    restart: on-failure
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:50056/info || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 20s

Other services in the same Docker network will then be able to access the reward server at http://sql-reward-server:50056.

2 details that worth mentioning:

  • you can add a ports section to expose the server on a specific host port, in case you deploy the server separately from Adaptive Engine (or even a different node)
  • you can add a volumes section to mount a volume from a host machine/NFS share to the container, in case you don’t want to include the SQLite databases in the container

Kubernetes

If deploying Adaptive Engine on Kubernetes, you can easily add a deployment and service for your SQL reward server by applying the following manifest:

sql-reward-server.yml
apiVersion: v1
kind: Service
metadata:
  name: sql-reward-server
  labels:
    app: sql-reward-server
spec:
  ports:
  - port: 50056
    targetPort: 50056
    protocol: TCP
    name: http
  selector:
    app: sql-reward-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sql-reward-server
  labels:
    app: sql-reward-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sql-reward-server
  template:
    metadata:
      labels:
        app: sql-reward-server
    spec:
      containers:
      - name: sql-reward-server
        image: sql-reward-server:1.0
        ports:
        - containerPort: 50056
        resources:
          limits:
            cpu: "1"
            memory: "1Gi"
          requests:
            cpu: "500m"
            memory: "512Mi"
        livenessProbe:
          httpGet:
            path: /info
            port: 50056
          initialDelaySeconds: 30
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /info
            port: 50056
          initialDelaySeconds: 15
          periodSeconds: 10

Other workloads running in the same Kubernetes cluster will then be able to access the reward server at http://sql-reward-server:50056 if running in the same namespace, or http://sql-reward-server.<reward-server-namespace>.svc.cluster.local:50056 if running in a different namespace.

In the manifest above, we refer to the container image as sql-reward-server:1.0. This would only work for a local cluster setup and development purposes. For production use, you should use a registry that is accessible to your Kubernetes cluster, such AWS Elastic Container Registry (ECR), Google Container Registry (GCR), or a private Docker registry.

Connect a reward server to Adaptive Engine

The Adaptive SDK provides methods to test that the Adaptive control plane can access your reward server, as well as to register it with Adaptive Engine. After registering your reward server, you can use it in any training or evaluation job by specifying its key.

To test that the Adaptive control plane can access your reward server:

test_reward_server_integration.py
from adaptive_sdk import Adaptive

adaptive = Adaptive(base_url="https://api.adaptive.xyz", api_key="your-api-key")

# Test that the Adaptive control plane can access your reward server
result = adaptive.reward_servers.test(url="http://sql-reward-server:50056")

if hasattr(result, "error"):
    print(f"Error connecting to reward server:\n {result.error}")
else:
    print("Successfully connected to reward server")
    print(f"Name: {result.name}")
    print(f"Description: {result.description}")
    print(f"Version: {result.version}")

Once you’ve tested that your reward server is working, you can register it with Adaptive Engine. You can also list available reward servers, and remove reward servers that are no longer needed.

admin_reward_servers.py
import random
from pprint import pprint

# Register new server
sql_server = adaptive.reward_servers.add(
    url="http://sql-reward-server:50056",
    key="sql-reward-server",
)

# List all servers, check the required metadata schema for one of them
servers = adaptive.reward_servers.list()
random_server = random.choice(servers)
print("Metadata schema for a random server:")
pprint(random_server.metadata_schema)

# Remove a server
_ = adaptive.reward_servers.remove(key=random_server.key)

Conclusion

External rewards server are a powerful way to fine-tune models with your own custom rewards. Now that you’ve learned how to build and deploy a reward server, check out how to use it for RL training.