> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Kubernetes

> Deploy Adaptive Engine on Kubernetes with Helm

Adaptive Engine can be easily configured and deployed on Kubernetes. This guide will walk you through the deployment process.

## Requirements

* A Kubernetes cluster.
* Helm version v3 or higher.
* NVIDIA Device Plugin pre-installed (required for GPU resource discovery, see [installation guide](https://github.com/NVIDIA/k8s-device-plugin?tab=readme-ov-file#deployment-via-helm)).
* (Optional, if using external secrets) External Secrets Operator pre-installed (see [installation guide](https://external-secrets.io/latest/introduction/getting-started/#option-1-install-from-chart-repository)).

## Adaptive Helm Chart

To install the [Adaptive Helm chart](https://github.com/adaptive-ml/adaptive-helm-chart/) on your Kubernetes cluster, follow these steps:

### Verify tool setup and cluster access

```bash theme={null}
# check if helm is installed
helm version
# make sure `kubectl` is configured correctly and you can access the cluster
kubectl get pods
```

### Get the Helm Chart

Add the Adaptive helm repo and update it:

```bash theme={null}
helm repo add adaptive https://raw.githubusercontent.com/adaptive-ml/adaptive-helm-chart/main/charts
helm repo update adaptive
```

Get the default `values.yaml` configuration file:

```bash theme={null}
helm show values adaptive/adaptive > values.yaml
```

### Modify values.yaml

Edit the `values.yaml` file to customize the Helm chart for your environment Here are the relevant values you should modify:

#### Container registry information

Add details for the Adaptive container registry you are subscribed and have been granted access to.

```yaml theme={null}
containerRegistry: <aws_account_id>.dkr.ecr.<region>.amazonaws.com
harmony:
  image:
    repository: adaptive-repository # Adaptive Repository you have been granted access to
    tag: harmony:latest # Harmony image tag

controlPlane:
  image:
    repository: adaptive-repository # Adaptive Repository you have been granted access to
    tag: control-plane:latest # Control plane image tag
```

#### Resource limits

Adjust the resource limits based on your cluster’s capabilities and workload/model requirements. `harmony.gpusPerNode` should match the available GPU resources
for each node in the cluster where Adaptive Harmony will be deployed. For example:

```yaml theme={null}
harmony:
    replicaCount: 1
    # Should be equal to, or a divisor of the # of GPUs on each node
    gpusPerReplica: 8
    resources:
        limits:
            cpu: 8
            memory: 64Gi
        requests:
            cpu: 8
            memory: 60Gi

```

#### Configuration secrets

Add values for the required configuration secrets:

```yaml theme={null}
secrets:
  # S3 bucket for model registry
  modelRegistryUrl: "s3://bucket-name/model_registry"
  # Use same bucket as above and can use a different prefix
  sharedDirectoryUrl: "s3://bucket-name/shared"

  # Postgres database connection string
  dbUrl: "postgres://username:password@db_address:5432/db_name"
  # Secret used to sign cookies. Must be the same on all servers of a cluster and >= 64 chars
  cookiesSecret: "change-me-secret-db40431e-c2fd-48a6-acd6-854232c2ed94-01dd4d01-dr7b-4315" # Must be >= 64 chars

  auth:
    oidc:
      providers:
        # Name of your OpenId provider displayed in the ui
        - name: "Google"
          # Key of your provider, the callback url will be '<rootUrl>/api/v1/auth/login/<key>/callback'
          key: "google"
          issuer_url: "https://accounts.google.com" # openid connect issuer url
          client_id: "replace_client_id" # client id
          client_secret: "replace_client_secret" # client_secret, optional
          scopes: ["email", "profile"] # scopes required for auth, requires email and profile
          # true if your provider supports pkce (recommended)
          pkce: true
          # if true, user account will be created if it does not exist
          allow_sign_up: true
```

If you do not want to create Kubernetes secrets from `values.yaml` and prefer to integrate secrets stored in an external/cloud secrets manager, see [Using external secrets](#external-secrets).

### Install the Helm chart

Deploy the Adaptive Helm chart:

```bash theme={null}
helm install adaptive \
    adaptive/adaptive \
    --values ./values.yaml
```

### <span id="external-secrets">Using external secrets</span>

The Adaptive Helm chart supports integration with external secret stores through [External Secrets Operator](https://external-secrets.io/latest/).
The chart implements an example where secrets are hosted on AWS Secrets Manager.

To use secrets stored in an external/cloud secrets manager, you first need to install External Secrets Operator:

```bash theme={null}
helm repo add external-secrets https://charts.external-secrets.io

helm install external-secrets \
    external-secrets/external-secrets \
    -n external-secrets \
    --create-namespace
```

Then, download the alternative `values_external_secret.yaml` file:

```bash theme={null}
wget https://raw.githubusercontent.com/adaptive-ml/adaptive-helm-chart/main/charts/adaptive/values_external_secret.yaml
```

Customize the new values file, adding the details of your external secret's name and properties.
You can replace AWS Secrets Manager with your secrets manager of choice; please check out the [documentation](https://external-secrets.io/latest/provider/aws-secrets-manager/) on this topic.

Finally, deploy the Adaptive Helm chart using the new values file:

```bash theme={null}
helm install adaptive \
    adaptive/adaptive \
    --values ./values_external_secret.yaml
```

## Considerations for deployment on shared clusters

When deploying Adaptive Engine in a shared cluster where other workloads are running, there are a few
best practices you can implement to enforce resource isolation:

### Deploy Adaptive in a separate namespace

When installing the Adaptive Helm chart, you can do so in a separate namespace by passing the `--namespace` option. Example:

```bash theme={null}
helm install adaptive \
  adaptive/adaptive \
  --values ./values.yaml
  --namespace adaptive-engine
```

You can also pass the `--create-namespace` if the namespace does not exist yet.

### Use Node Selectors to schedule Adaptive on specific GPU nodes

You can use the `harmony.nodeSelector` value in `values.yaml` to schedule Adaptive Harmony only on a specific node group.
For example, if you are deploying Adaptive on an Amazon EKS cluster, you might add:

```yaml theme={null}
harmony:
  nodeSelector: 
    eks.amazonaws.com/nodegroup: p5-h100
```

### Dedicated GPU node tenancy

Although the Adaptive control plane can run on any node where there are available CPU and memory resources,
it is recommended that Harmony is scheduled to request and take ownership of all of the GPUs available
on each GPU-enabled node. Although you might have already made sure Adaptive Harmony is only scheduled on a
designated GPU node group using the instructions in the step above,
you might want to guarantee no other workloads can be scheduled on those nodes.

To dedicate a set of GPU nodes for Adaptive Harmony, you can use a combination of:

1. Adding a taint to the GPU nodes
2. Adding a corresponding toleration to Harmony in the `values.yaml` of the Adaptive Helm Chart

To add a taint to a node, you can first run `kubectl get nodes -o name` to see all the existing node names, and then
taint them as exemplified below (replacing `node_name`):

```bash theme={null}
kubectl taint nodes node_name dedicated=adaptive-engine:NoSchedule
```

You can then add a matching toleration to Harmony in the `values.yaml` file (`harmony.tolerations`)
which will allow it to be scheduled on the tainted nodes:

```yaml theme={null}
harmony:
  tolerations:
  - key: dedicated
    operator: Equal
    value: adaptive-engine
    effect: NoSchedule
```

You can find more about taints and tolerations in the official Kubernetes [documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/).

## Advanced configuration

### Database SSL/TLS configuration

Adaptive Engine supports secure TLS connections between the database and control plane.

#### Basic setting

If your PostgreSQL database supports TLS, you can enforce encrypted connections by adding the parameter `sslmode=require` to your PostgreSQL connection string `dbUrl` in the Helm chart's `values.yaml` file:

```yaml theme={null}
  dbUrl: "postgres://<user>:<password>@<host>/<db>?sslmode=require"
```

Although `sslmode=require` encrypts the database connection, it does not verify the server’s identity.

#### Server certificate verification

In order for the application to be able to verify the server certificate, you must set sslmode to `verify-ca` or `-verify-full`.

* `verify-ca` will verify the server certificate
* `verify-full` will verify the server certificate and also that the server host name matches the name stored in the server certificate

`verify-full` is the recommended option for maximum security.

You will need to provide the application with a root certificate to make server certification possible. You can do so by following these steps:

1. Download the db server certificate (if you're using AWS RDS for example, refer to [this page](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html)), for instance `rds-ca-rsa2048-g1.pem`

2. Upload the pem file to your k8s cluster. As the certificate is non-critical, public information, it can uploaded as a ConfigMap

```
kubectl create configmap -n <namespace> db-ca --from-file=rds-ca-rsa2048-g1.pem
```

3. Mount the file as a volume to the control plane deployment by editing `values.yaml`:

```yaml theme={null}
...

volumes:
  - name: db-ca
    configMap:
      name: db-ca

volumeMounts:
  - name: db-ca
    mountPath: /mnt/db-ca/
    readOnly: true
```

4. Use the `sslrootcert` parameter to refer to the certificate in the PostgresDB connection url, specifying `mountPath + filename`:

```yaml theme={null}
  dbUrl: "postgres://<user>:<password>@<host>/<db>?sslmode=verify-full&sslrootcert=/mnt/db-ca/rds-ca-rsa2048-g1.pem"
```

Refer to the [official documentation for SSL support on PostgresSQL](https://www.postgresql.org/docs/current/libpq-ssl.html) for more information.
