Kubernetes
Deploy Adaptive Engine on Kubernetes with Helm
Adaptive Engine can be easily configured and deployed on Kubernetes. This guide will walk you through the deployment process.
Requirements
- A Kubernetes cluster.
- Helm version v3 or higher.
- NVIDIA Device Plugin pre-installed (required for GPU resource discovery, see installation guide).
- (Optional, if using external secrets) External Secrets Operator pre-installed (see installation guide).
Adaptive Helm Chart
To install the Adaptive Helm chart on your Kubernetes cluster, follow these steps:
Verify tool setup and cluster access
Get the Helm Chart
Add the Adaptive helm repo and update it:
Get the default values.yaml
configuration file:
Modify values.yaml
Edit the values.yaml
file to customize the Helm chart for your environment Here are the relevant values you should modify:
Container registry information
Add details for the Adaptive container registry you are subscribed and have been granted access to.
Resource limits
Adjust the resource limits based on your cluster’s capabilities and workload/model requirements. harmony.gpusPerNode
should match the available GPU resources
for each node in the cluster where Adaptive Harmony will be deployed. For example:
Configuration secrets
Add values for the required configuration secrets:
If you do not want to create Kubernetes secrets from values.yaml
and prefer to integrate secrets stored in an external/cloud secrets manager, see Using external secrets.
Install the Helm chart
Deploy the Adaptive Helm chart:
Using external secrets
The Adaptive Helm chart supports integration with external secret stores through External Secrets Operator. The chart implements an example where secrets are hosted on AWS Secrets Manager.
To use secrets stored in an external/cloud secrets manager, you first need to install External Secrets Operator:
Then, download the alternative values_external_secret.yaml
file:
Customize the new values file, adding the details of your external secret’s name and properties. You can replace AWS Secrets Manager with your secrets manager of choice; please check out the documentation on this topic.
Finally, deploy the Adaptive Helm chart using the new values file:
Considerations for deployment on shared clusters
When deploying Adaptive Engine in a shared cluster where other workloads are running, there are a few best practices you can implement to enforce resource isolation:
Deploy Adaptive in a separate namespace
When installing the Adaptive Helm chart, you can do so in a separate namespace by passing the --namespace
option. Example:
You can also pass the --create-namespace
if the namespace does not exist yet.
Use Node Selectors to schedule Adaptive on specific GPU nodes
You can use the harmony.nodeSelector
value in values.yaml
to schedule Adaptive Harmony only on a specific node group.
For example, if you are deploying Adaptive on an Amazon EKS cluster, you might add:
Dedicated GPU node tenancy
Although the Adaptive control plane can run on any node where there are available CPU and memory resources, it is recommended that Harmony is scheduled to request and take ownership of all of the GPUs available on each GPU-enabled node. Although you might have already made sure Adaptive Harmony is only scheduled on a designated GPU node group using the instructions in the step above, you might want to guarantee no other workloads can be scheduled on those nodes.
To dedicate a set of GPU nodes for Adaptive Harmony, you can use a combination of:
- Adding a taint to the GPU nodes
- Adding a corresponding toleration to Harmony in the
values.yaml
of the Adaptive Helm Chart
To add a taint to a node, you can first run kubectl get nodes -o name
to see all the existing node names, and then
taint them as exemplified below (replacing node_name
):
You can then add a matching toleration to Harmony in the values.yaml
file (harmony.tolerations
)
which will allow it to be scheduled on the tainted nodes:
You can find more about taints and tolerations in the official Kubernetes documentation.
Advanced configuration
DB SSL/TLS configuration
See official docs too
Basic setting
If your PostgreSQL database supports TLS, you can enforce encrypted connections by adding the parameter sslmode=require
to your PostgreSQL connection string in your Helm values.yaml
file:
sslmode=require
encrypts the connection but does not verify the server’s identity.
Server certificate verification
If you want the application to be able to verify the server certificate, you need to set the sslmode to verify-ca
or verify-full
.
verify-ca
will verify the server certificateverify-full
will verify the server certificate and also that the server host name matches the name stored in the server certificate
For most secure environments verify-full
is recommended.
To allow server verification, you will need to provide the application with a root certificate. You can do this by following these steps:
-
Download the db server certificate (for AWS RDS, refer to this page), for instance
rds-ca-rsa2048-g1.pem
-
Upload the pem file to your k8s cluster. As this is public information, this can uploaded to a ConfigMap
- Mount this file in the control plane container by editing the helm
values.yaml
file:
- Refer to this certificate in the postgres connection url in the parameter
sslrootcert
: