Adaptive Engine can be easily configured and deployed on Kubernetes. This guide will walk you through the deployment process.

Requirements

  • A Kubernetes cluster.
  • Helm version v3 or higher.
  • NVIDIA Device Plugin pre-installed (required for GPU resource discovery, see installation guide).
  • (Optional, if using external secrets) External Secrets Operator pre-installed (see installation guide).

Adaptive Helm Chart

To install the Adaptive Helm chart on your Kubernetes cluster, follow these steps:

Verify tool setup and cluster access

# check if helm is installed
helm version
# make sure `kubectl` is configured correctly and you can access the cluster
kubectl get pods

Get the Helm Chart

Add the Adaptive helm repo and update it:

helm repo add adaptive https://raw.githubusercontent.com/adaptive-ml/adaptive-helm-chart/main/charts
helm repo update adaptive

Get the default values.yaml configuration file:

helm show values adaptive/adaptive > values.yaml

Modify values.yaml

Edit the values.yaml file to customize the Helm chart for your environment Here are the relevant values you should modify:

Container registry information

Add details for the Adaptive container registry you are subscribed and have been granted access to.

containerRegistry: <aws_account_id>.dkr.ecr.<region>.amazonaws.com
harmony:
  image:
    repository: adaptive-repository # Adaptive Repository you have been granted access to
    tag: harmony:latest # Harmony image tag

controlPlane:
  image:
    repository: adaptive-repository # Adaptive Repository you have been granted access to
    tag: control-plane:latest # Control plane image tag

Resource limits

Adjust the resource limits based on your cluster’s capabilities and workload/model requirements. harmony.gpusPerNode should match the available GPU resources for each node in the cluster where Adaptive Harmony will be deployed. For example:

harmony:
    replicaCount: 1
    # Should be equal to, or a divisor of the # of GPUs on each node
    gpusPerReplica: 8
    resources:
        limits:
            cpu: 8
            memory: 64Gi
        requests:
            cpu: 8
            memory: 60Gi

Configuration secrets

Add values for the required configuration secrets:

secrets:
  # S3 bucket for model registry
  modelRegistryUrl: "s3://bucket-name/model_registry"
  # Use same bucket as above and can use a different prefix
  sharedDirectoryUrl: "s3://bucket-name/shared"

  # Postgres database connection string
  dbUrl: "postgres://username:password@db_adress:5432/db_name"
  # Secret used to sign cookies. Must be the same on all servers of a cluster and >= 64 chars
  cookiesSecret: "change-me-secret-db40431e-c2fd-48a6-acd6-854232c2ed94-01dd4d01-dr7b-4315" # Must be >= 64 chars

  auth:
    oidc:
      providers:
        # Name of your OpenId provider displayed in the ui
        - name: "Google"
          # Key of your provider, the callback url will be '<rootUrl>/api/v1/auth/login/<key>/callback'
          key: "google"
          issuer_url: "https://accounts.google.com" # openid connect issuer url
          client_id: "replace_client_id" # client id
          client_secret: "replace_client_secret" # client_secret, optional
          scopes: ["email", "profile"] # scopes required for auth, requires email and profile
          # true if your provider supports pkce (recommended)
          pkce: true
          # if true, user account will be created if it does not exist
          allow_sign_up: true

If you do not want to create Kubernetes secrets from values.yaml and prefer to integrate secrets stored in an external/cloud secrets manager, see Using external secrets.

Install the Helm chart

Deploy the Adaptive Helm chart:

helm install adaptive \
    adaptive/adaptive \
    --values ./values.yaml

Using external secrets

The Adaptive Helm chart supports integration with external secret stores through External Secrets Operator. The chart implements an example where secrets are hosted on AWS Secrets Manager.

To use secrets stored in an external/cloud secrets manager, you first need to install External Secrets Operator:

helm repo add external-secrets https://charts.external-secrets.io

helm install external-secrets \
    external-secrets/external-secrets \
    -n external-secrets \
    --create-namespace

Then, download the alternative values_external_secret.yaml file:

wget https://raw.githubusercontent.com/adaptive-ml/adaptive-helm-chart/main/charts/adaptive/values_external_secret.yaml

Customize the new values file, adding the details of your external secret’s name and properties. You can replace AWS Secrets Manager with your secrets manager of choice; please check out the documentation on this topic.

Finally, deploy the Adaptive Helm chart using the new values file:

helm install adaptive \
    adaptive/adaptive \
    --values ./values_external_secret.yaml

Considerations for deployment on shared clusters

When deploying Adaptive Engine in a shared cluster where other workloads are running, there are a few best practices you can implement to enforce resource isolation:

Deploy Adaptive in a separate namespace

When installing the Adaptive Helm chart, you can do so in a separate namespace by passing the --namespace option. Example:

helm install adaptive \
  adaptive/adaptive \
  --values ./values.yaml
  --namespace adaptive-engine

You can also pass the --create-namespace if the namespace does not exist yet.

Use Node Selectors to schedule Adaptive on specific GPU nodes

You can use the harmony.nodeSelector value in values.yaml to schedule Adaptive Harmony only on a specific node group. For example, if you are deploying Adaptive on an Amazon EKS cluster, you might add:

harmony:
  nodeSelector: 
    eks.amazonaws.com/nodegroup: p5-h100

Dedicated GPU node tenancy

Although the Adaptive control plane can run on any node where there are available CPU and memory resources, it is recommended that Harmony is scheduled to request and take ownership of all of the GPUs available on each GPU-enabled node. Although you might have already made sure Adaptive Harmony is only scheduled on a designated GPU node group using the instructions in the step above, you might want to guarantee no other workloads can be scheduled on those nodes.

To dedicate a set of GPU nodes for Adaptive Harmony, you can use a combination of:

  1. Adding a taint to the GPU nodes
  2. Adding a corresponding toleration to Harmony in the values.yaml of the Adaptive Helm Chart

To add a taint to a node, you can first run kubectl get nodes -o name to see all the existing node names, and then taint them as exemplified below (replacing node_name):

kubectl taint nodes node_name dedicated=adaptive-engine:NoSchedule

You can then add a matching toleration to Harmony in the values.yaml file (harmony.tolerations) which will allow it to be scheduled on the tainted nodes:

harmony:
  tolerations:
  - key: dedicated
    operator: Equal
    value: adaptive-engine
    effect: NoSchedule

You can find more about taints and tolerations in the official Kubernetes documentation.

Advanced configuration

DB SSL/TLS configuration

See official docs too

Basic setting

If your PostgreSQL database supports TLS, you can enforce encrypted connections by adding the parameter sslmode=require to your PostgreSQL connection string in your Helm values.yaml file:

  dbUrl: "postgres://<user>:<password>@<host>/<db>?sslmode=require"

sslmode=require encrypts the connection but does not verify the server’s identity.

Server certificate verification

If you want the application to be able to verify the server certificate, you need to set the sslmode to verify-ca or verify-full.

  • verify-ca will verify the server certificate
  • verify-full will verify the server certificate and also that the server host name matches the name stored in the server certificate

For most secure environments verify-full is recommended.

To allow server verification, you will need to provide the application with a root certificate. You can do this by following these steps:

  1. Download the db server certificate (for AWS RDS, refer to this page), for instance rds-ca-rsa2048-g1.pem

  2. Upload the pem file to your k8s cluster. As this is public information, this can uploaded to a ConfigMap

kubectl create configmap -n <namespace> db-ca --from-file=rds-ca-rsa2048-g1.pem
  1. Mount this file in the control plane container by editing the helm values.yaml file:
...

volumes:
  - name: db-ca
    configMap:
      name: db-ca

volumeMounts:
  - name: db-ca
    mountPath: /mnt/db-ca/
    readOnly: true
  1. Refer to this certificate in the postgres connection url in the parameter sslrootcert:
  dbUrl: "postgres://<user>:<password>@<host>/<db>?sslmode=verify-full&sslrootcert=/mnt/db-ca/rds-ca-rsa2048-g1.pem"