Learn how to write custom reward functions for training and evaluation
A reward server used for both training and evaluation on Adaptive Engine.
RewardServer
base class built on top of FastAPI that handles all the server-side functionality required to deploy a reward server.
RewardServer
s allow you to optionally define an input schema that enforces the existence of metadata required for reward scoring. For example, a ground truth computed_result
would be needed if we were training a model to solve math problems. When a training job is launched, Adaptive Engine polls your reward server for its required metadata schema, and validates that every record in the selected dataset includes this metadata, cancelling the job launch if they don’t. This allows you to decouple the reward scoring logic from the job that you’re running, and guarantee data validation by design.
To create your own reward server, you need to:
EmptyMetadata
class if your reward function doesn’t require metadata)RewardServer
with:
score
method, which computes the reward for a single sampleinfo
method, which provides server metadata back to Adaptive Enginescore
and info
methods:
score
takes a ValidatedRequest
object, which represents one of the records in your dataset. turns
is the full conversation history, and metadata
is the metadata object you defined.score
returns a Response
object, which contains the reward
attributed to the record and an optional metadata
dictionary, which you can use to return any information you want from the server, including detailed feedback or error messages.info
returns a ServerInfo
object, which contains the server version, name and description. Adaptive Engine uses this information to register your server and display it in the UI.EmptyMetadata
class:
1.0
if the results match0.0
if they don’t-1.0
if the query is invalid, throws an execution error, or the model’s response is not a valid SQL queryRewardClient
to test your reward server locally before deploying it for production use in Adaptive Engine. This allows you to easily build test cases and verify that your server returns the correct rewards for different inputs.
As an example, after running the scary-letter-reward-server.py
example server above, you can test it with the following client:
batch_score
method to score multiple requests at once.
http://sql-reward-server:50056
.
2 details that worth mentioning:
ports
section to expose the server on a specific host port, in case you deploy the server separately from Adaptive Engine (or even a different node)volumes
section to mount a volume from a host machine/NFS share to the container, in case you don’t want to include the SQLite databases in the containerhttp://sql-reward-server:50056
if running in the same namespace, or http://sql-reward-server.<reward-server-namespace>.svc.cluster.local:50056
if running in a different namespace.
sql-reward-server:1.0
. This would only work for a local cluster setup and development purposes. For production use, you should use a registry that is accessible to your Kubernetes cluster, such AWS Elastic Container Registry (ECR), Google Container Registry (GCR), or a private Docker registry.