Adaptive Engine allows you to create your own reward function implementation to train and evaluate models. You can expose this function as an external server, which Adaptive must be able to access via HTTP/HTTPS.
Although reward servers support any arbitrary reward, they are especially useful when your reward depends on external systems like databases, simulated environments, sandboxes, or APIs, where feedback from execution or interaction is available.
A reward server used for both training and evaluation on Adaptive Engine.
The scale of absolute values for rewards can be arbitrary with no need for normalization, the only requirement is consistency.
During a training or evaluation job, batches of samples (prompt messages + completion) are sent to the reward server for scoring. The job waits for the server to respond, retrying with exponential backoff if any network disruptions are detected.
A standard API specification and Python SDK tooling help you easily build and deploy a reward server.
Building a Reward Server
Implementation guide
The Adaptive SDK includes a RewardServer
base class built on top of FastAPI that handles all the server-side functionality required to deploy a reward server.
RewardServer
s allow you to optionally define an input schema that enforces the existence of metadata required for reward scoring. For example, a ground truth computed_result
would be needed if we were training a model to solve math problems. When a training job is launched, Adaptive Engine polls your reward server for its required metadata schema, and validates that every record in the selected dataset includes this metadata, cancelling the job launch if they don’t. This allows you to decouple the reward scoring logic from the job that you’re running, and guarantee data validation by design.
To create your own reward server, you need to:
- Define a metadata class (or use the
EmptyMetadata
class if your reward function doesn’t require metadata)
- Implement a subclass of
RewardServer
with:
- An implementation of the
score
method, which computes the reward for a single sample
- An implementation of the
info
method, which provides server metadata back to Adaptive Engine
Let’s take a look at the function signatures for the score
and info
methods:
from typing import Any, Generic
from pydantic import BaseModel
from adaptive_sdk.external import ValidatedRequest, Response
class Turn(BaseModel):
role: str
content: str
class ValidatedRequest(BaseModel, Generic[MetadataModel]):
turns: list[Turn]
metadata: MetadataModel
class Response(BaseModel):
reward: float
metadata: dict[str, Any]
class ServerInfo(BaseModel):
version: str
name: str
description: str
async def score(self, request: ValidatedRequest[MetadataModel]) -> Response:
# your reward scoring implementation goes here
computed_reward = 0.0
return Response(
reward=computed_reward,
metadata={}
)
async def info(self) -> ServerInfo:
return ServerInfo(
version="1.0",
name="My Reward Server",
description="A simple reward server that gives a reward based on the model's completion"
)
Here’s what we can derive from the above:
score
takes a ValidatedRequest
object, which represents one of the records in your dataset. turns
is the full conversation history, and metadata
is the metadata object you defined.
score
returns a Response
object, which contains the reward
attributed to the record and an optional metadata
dictionary, which you can use to return any information you want from the server, including detailed feedback or error messages.
info
returns a ServerInfo
object, which contains the server version, name and description. Adaptive Engine uses this information to register your server and display it in the UI.
That’s it! We can now get into some real examples.
Example servers
Hello World
Let’s create a toy reward server that rewards samples based on whether a “scary letter” appears in the model’s completions, which is specific to each sample.
scary_letter_reward_server.py
from adaptive_sdk.external import RewardServer, ValidatedRequest, Response, ServerInfo
from pydantic import BaseModel, Field
# Define the metadata class
class ScaryLetterMetadata(BaseModel):
scary_letter: str = Field(min_length=1, max_length=1)
# Implement the reward server
class ScaryLetterRewardServer(RewardServer[ScaryLetterMetadata]):
def __init__(self, port: int = 8000, blocking: bool = True, **kwargs):
super().__init__(port, ScaryLetterMetadata, blocking, **kwargs)
async def score(self, request: ValidatedRequest[ScaryLetterMetadata]) -> Response:
# Get the last turn from the conversation, corresponding to
# the model's completions
last_completion = request.turns[-1].content
# Count occurrences of the scary letter
num_scary_letters = last_completion.count(request.metadata.scary_letter)
# Return a reward of 1.0 if no scary letters are found, 0.0 otherwise
return Response(
reward=0.0 if request.metadata.scary_letter in last_completion else 1.0,
metadata={
"feedback": (
"There were no scary letters!"
if num_scary_letters == 0
else f"There were {num_scary_letters} scary letters!"
)
},
)
async def info(self) -> ServerInfo:
return ServerInfo(
version="1.0",
name="Scary Letter Detector",
description="Rewards completions that avoid a specific letter."
)
# Start the server
if __name__ == "__main__":
server = ScaryLetterRewardServer(port=50056)
To run this server:
python scary-letter-reward-server.py
If your reward function doesn’t need any additional metadata, you can use the EmptyMetadata
class:
no_metadata_reward_server.py
from adaptive_sdk.external import RewardServer, EmptyMetadata, ValidatedRequest, Response, ServerInfo
class LengthRewardServer(RewardServer[EmptyMetadata]):
def __init__(self, port: int = 8000, blocking: bool = True, **kwargs):
super().__init__(port, EmptyMetadata, blocking, **kwargs)
async def score(self, request: ValidatedRequest[EmptyMetadata]) -> Response:
# Simply reward long completions, up to 100 characters
completion = request.turns[-1].content
return Response(
reward=min(len(completion) / 100, 1.0),
metadata={}
)
async def info(self) -> ServerInfo:
return ServerInfo(
version="1.0",
name="Length Rewarder",
description="Rewards longer responses (up to 100 characters)"
)
if __name__ == "__main__":
server = LengthRewardServer(port=50056)
SQL Query Execution
This more realistic example creates a reward server for a Text-to-SQL task. The reward server:
- Takes the correct ground truth result for the prompt as metadata
- Executes the model’s generated SQL query against a SQLite database
- Compares the results and assigns a reward of:
1.0
if the results match
0.0
if they don’t
-1.0
if the query is invalid, throws an execution error, or the model’s response is not a valid SQL query
import sqlite3
import os
import pandas as pd
from adaptive_sdk.external import RewardServer, ValidatedRequest, Response, ServerInfo
from pydantic import BaseModel, Field
from typing import List, Dict, Any
class SQLMetadata(BaseModel):
ground_truth_results: List[Dict[str, Any]] = Field(
description="The expected query results as a list of dictionaries"
)
db_path: str = Field(
description="Path to a SQLite database file"
)
class SQLRewardServer(RewardServer[SQLMetadata]):
def __init__(self, db_base_path:str, port=8000, blocking=True, **kwargs):
self.db_base_path = db_base_path
super().__init__(port, SQLMetadata, blocking, **kwargs)
async def score(self, request: ValidatedRequest[SQLMetadata]) -> Response:
# Extract the SQL query from the model's completion
sql_query = request.turns[-1].content
# Representative, we don't check for the exact syntax, only that it starts with SELECT
if not (sql_query.startswith("SELECT") or sql_query.startswith("select")):
return Response(
reward=-1.0,
metadata={"status": "invalid_query"}
)
# Connect to the database
try:
conn = sqlite3.connect(
os.path.join(self.db_base_path, request.metadata.db_path)
)
# Execute the query
df_actual = pd.read_sql_query(sql_query, conn)
actual_results = df_actual.to_dict(orient='records')
# Compare with expected results
success_or_fail = self._results_match(actual_results, request.metadata.ground_truth_results)
return Response(
reward=float(success_or_fail),
metadata={
"status": "success" if success_or_fail else "wrong_result",
"actual_results": actual_results,
"ground_truth_results": request.metadata.ground_truth_results
}
)
except Exception as e:
# Query caused an error
return Response(
reward=-1.0,
metadata={
"status": "error",
"message": f"Query execution error: {str(e)}"
}
)
finally:
if 'conn' in locals():
conn.close()
def _results_match(self, actual_results, expected_results):
"""Compare query results, handle potential ordering differences"""
if len(actual_results) != len(expected_results):
return False
# Convert to sets of frozensets for order-independent comparison
actual_set = set(frozenset(d.items()) for d in actual_results)
expected_set = set(frozenset(d.items()) for d in expected_results)
return actual_set == expected_set
async def info(self) -> ServerInfo:
return ServerInfo(
version="1.0",
name="SQL Query Execution Success",
description="Evaluates SQL queries by comparing execution results with ground truth"
)
if __name__ == "__main__":
server = SQLRewardServer(db_base_path="/path/to/dbs/", port=50056)
Test your reward server locally
You can use the SDK’s RewardClient
to test your reward server locally before deploying it for production use in Adaptive Engine. This allows you to easily build test cases and verify that your server returns the correct rewards for different inputs.
As an example, after running the scary-letter-reward-server.py
example server above, you can test it with the following client:
local_reward_server_test.py
from adaptive_sdk.external import RewardClient, Request, Turn
reward_client = RewardClient(url="http://localhost:50056")
response = reward_client.score(
Request(
turns=[
Turn(role="system", content="You are a helpful assistant that answers questions about geography."),
Turn(role="user", content="What is the name of the sea that touches Stockholm?"),
Turn(role="assistant", content="Baltic Sea."),
],
metadata={
"scary_letter": "d",
},
)
)
assert response.reward == 1.0
You can also use the batch_score
method to score multiple requests at once.
Deploy a reward server
Your reward server needs to be accessible to Adaptive Engine over HTTP or HTTPS. You have several deployment options:
- Local development/testing: For initial development and testing, you can use a tool like ngrok to expose your local server and test integration and access.
- Bare metal deployment: Deploy and run your server on bare metal
- Containerization: Package your server as a Docker container and deploy to a cloud container service or orchestrator like Kubernetes
The easiest setup for production use is to containerize your server and colocate it with your Adaptive Engine deployment. Whether you deploy Adaptive Engine on a single node with Docker Compose or on Kubernetes, this approach will facilitate server discovery and access.
Let’s take the example of our SQL Query Execution reward server. The directory structure of the project is as follows (note you can choose to include the SQLite databases in the container, or mount a volume from a host machine/NFS share):
sql-query-execution-reward-server/
├── __init__.py
├── db_data/
├── geography.sqlite
└── weather.sqlite
├── Dockerfile
├── requirements.txt
├── sql_reward_server.py
Your requirements.txt file should include the following dependencies:
adaptive-sdk
pandas
pydantic
And a simple Dockerfile for the reward server container could look like the following:
FROM python:3.10-slim
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "sql_reward_server.py"]
You can then build the container:
docker build -t sql-reward-server:1.0 .
Docker Compose
If deploying Adaptive Engine on a single node with Docker Compose, you can add the following to the docker-compose.yml file:
[... other services ...]
sql-reward-server:
image: sql-reward-server:1.0
restart: on-failure
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:50056/info || exit 1"]
interval: 30s
timeout: 10s
retries: 5
start_period: 20s
Other services in the same Docker network will then be able to access the reward server at http://sql-reward-server:50056
.
2 details that worth mentioning:
- you can add a
ports
section to expose the server on a specific host port, in case you deploy the server separately from Adaptive Engine (or even a different node)
- you can add a
volumes
section to mount a volume from a host machine/NFS share to the container, in case you don’t want to include the SQLite databases in the container
Kubernetes
If deploying Adaptive Engine on Kubernetes, you can easily add a deployment and service for your SQL reward server by applying the following manifest:
apiVersion: v1
kind: Service
metadata:
name: sql-reward-server
labels:
app: sql-reward-server
spec:
ports:
- port: 50056
targetPort: 50056
protocol: TCP
name: http
selector:
app: sql-reward-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sql-reward-server
labels:
app: sql-reward-server
spec:
replicas: 1
selector:
matchLabels:
app: sql-reward-server
template:
metadata:
labels:
app: sql-reward-server
spec:
containers:
- name: sql-reward-server
image: sql-reward-server:1.0
ports:
- containerPort: 50056
resources:
limits:
cpu: "1"
memory: "1Gi"
requests:
cpu: "500m"
memory: "512Mi"
livenessProbe:
httpGet:
path: /info
port: 50056
initialDelaySeconds: 30
periodSeconds: 30
readinessProbe:
httpGet:
path: /info
port: 50056
initialDelaySeconds: 15
periodSeconds: 10
Other workloads running in the same Kubernetes cluster will then be able to access the reward server at http://sql-reward-server:50056
if running in the same namespace, or http://sql-reward-server.<reward-server-namespace>.svc.cluster.local:50056
if running in a different namespace.
In the manifest above, we refer to the container image as sql-reward-server:1.0
. This would only work for a local cluster setup and development purposes. For production use, you should use a registry that is accessible to your Kubernetes cluster, such AWS Elastic Container Registry (ECR), Google Container Registry (GCR), or a private Docker registry.
Connect a reward server to Adaptive Engine
The Adaptive SDK provides methods to test that the Adaptive control plane can access your reward server, as well as to register it with Adaptive Engine. After registering your reward server, you can use it in any training or evaluation job by specifying its key.
To test that the Adaptive control plane can access your reward server:
test_reward_server_integration.py
from adaptive_sdk import Adaptive
adaptive = Adaptive(base_url="https://api.adaptive.xyz", api_key="your-api-key")
# Test that the Adaptive control plane can access your reward server
result = adaptive.reward_servers.test(url="http://sql-reward-server:50056")
if hasattr(result, "error"):
print(f"Error connecting to reward server:\n {result.error}")
else:
print("Successfully connected to reward server")
print(f"Name: {result.name}")
print(f"Description: {result.description}")
print(f"Version: {result.version}")
Once you’ve tested that your reward server is working, you can register it with Adaptive Engine. You can also list available reward servers, and remove reward servers that are no longer needed.
import random
from pprint import pprint
# Register new server
sql_server = adaptive.reward_servers.add(
url="http://sql-reward-server:50056",
key="sql-reward-server",
)
# List all servers, check the required metadata schema for one of them
servers = adaptive.reward_servers.list()
random_server = random.choice(servers)
print("Metadata schema for a random server:")
pprint(random_server.metadata_schema)
# Remove a server
_ = adaptive.reward_servers.remove(key=random_server.key)
Conclusion
External rewards server are a powerful way to fine-tune models with your own custom rewards. Now that you’ve learned how to build and deploy a reward server, check out how to use it for RL training.