> ## Documentation Index
> Fetch the complete documentation index at: https://docs.adaptive-ml.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Self-hosting on Amazon EKS

> Deploy Adaptive Engine on Amazon EKS with EC2 Capacity Blocks

This guide covers the AWS-specific infrastructure and Helm configuration needed to deploy Adaptive Engine on Amazon EKS with [EC2 Capacity Blocks](https://aws.amazon.com/ec2/capacityblocks/pricing/). It complements the general [self-hosting guide](/v0.13/deploy/self-hosting) and references the [Adaptive Helm chart](https://github.com/adaptive-ml/adaptive-helm-chart/tree/main) as the source of truth for all Helm values.

This page assumes P-family instance types (p5, p5e, p5en, p6-b200, p6-b300, p6e-b200, p6e-b300).

## What are EC2 Capacity Blocks?

EC2 Capacity Blocks ("capacity blocks") are region-specific, instance-specific reservations that last between 1 and 180 days. A reservation can start immediately (subject to availability) or at a future date,
and spans one or more EC2 instances. See the [capacity blocks documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-blocks.html) for precise reservation mechanics and pricing.
The capacity block is a fairly unique GPU procurement mechanism, available only on AWS.

<Warning>
  Capacity blocks are **different** than On-Demand Capacity Reservations (ODCR), Reserved Instances (RI) or EC2 Savings Plan (SP).
</Warning>

Adaptive ML recommends capacity blocks with Adaptive Engine for the following reasons:

1. **GPU availability** — P-family instances have practically no on-demand capacity. Capacity blocks provide reserved GPU instances with predictable lead times.
2. **Cost** — Capacity blocks are priced below on-demand rates, and in some cases below annual reservations.
3. **Elasticity** — An Adaptive Engine cluster can vary its GPU count on a daily basis as capacity blocks are created or expire. This allows organizations
   to grow capacity with workload demand and reduce cost when GPUs are not needed.

<Info>
  Capacity blocks are not mandatory to use Adaptive Engine on AWS. For low-volume inference of small-sized models, g6 and g6e instances can be used,
  and have satisfactory on-demand capacity.
</Info>

## Architecture overview

A production deployment on AWS uses the following services:

| Component         | AWS service                  |
| ----------------- | ---------------------------- |
| Kubernetes        | Amazon EKS                   |
| GPU VMs           | EC2 Capacity Blocks          |
| Postgres database | Amazon RDS (PostgreSQL)      |
| Redis datastore   | Amazon ElastiCache for Redis |
| Model registry    | Amazon S3                    |
| Secrets           | AWS Secrets Manager          |
| Logging           | Amazon CloudWatch            |
| DNS/TLS           | ACM + your DNS provider      |
| Ingress           | AWS Load Balancer Controller |

Refer to the [Adaptive Engine architecture](/v0.13/deploy/architecture) section to see how these services interact.

## Deployment checklist

Here are the steps to deploy an Adaptive Engine cluster on Amazon EKS with EC2 Capacity Blocks. Subsequent sections provide details and code snippets.

1. **Provision AWS infrastructure** — Create VPC, EKS cluster, RDS, ElastiCache, and S3 bucket.
2. **Install cluster dependencies** — Deploy the NVIDIA GPU Operator, AWS Load Balancer Controller, External Secrets Operator, and optionally the CloudWatch agent.
3. **Populate Secrets Manager** — Store database connection details, S3 bucket paths, Redis authentication token, OIDC provider configuration, and cookies secret. Deploy the `ClusterSecretStore` and `ExternalSecret` resources.
4. **Configure Helm values and deploy the control plane** — Customize `values.yaml` and install the Helm chart with `replicaCount: 0` for Harmony (no GPUs yet). Login to your domain name and go through the OIDC authentication flow to verify that the control plane is online.
5. **Purchase capacity blocks** — Reserve GPU instances via the EC2 console or CLI. Tag them for Karpenter discovery.
6. **Configure Karpenter node pools** — Deploy `EC2NodeClass` and `NodePool` resources targeting your capacity block.
7. **Add compute pools and upgrade** — Set Harmony compute pools in `values.yaml` with node selectors and tolerations pointing to your capacity block nodes. Run `helm upgrade`.
8. **Verify GPUs** — Confirm Harmony pods are scheduled on capacity block nodes. Open the control plane UI and check that your GPUs appear in the *compute pools* section.

## Prerequisites

Before starting, complete the general [self-hosting prerequisites](/v0.13/deploy/self-hosting#prerequisites) and ensure you have:

* An AWS account with GPU quota for your target instance types
* The [External Secrets Operator](https://external-secrets.io/) deployed in the cluster

Choose one of the following EKS cluster configurations:

<Tabs>
  <Tab title="EKS with Karpenter" icon="gears">
    * An EKS cluster (1.28+) with [Karpenter](https://karpenter.sh/) installed
    * The [AWS Load Balancer Controller](https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html) deployed in the cluster
    * The [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/amazon-eks.html) deployed in the cluster

    This option gives you full control over node provisioning, instance selection, and load balancer configuration.
  </Tab>

  <Tab title="EKS Auto Mode" icon="wand-magic-sparkles">
    * An EKS cluster (1.28+) with [Auto Mode](https://docs.aws.amazon.com/eks/latest/userguide/automode.html) enabled

    EKS Auto Mode manages node provisioning (replacing Karpenter) and load balancing (replacing the AWS Load Balancer Controller) automatically. Use `NodePool` and `NodeClass` resources from the EKS Auto Mode API (`eks.amazonaws.com/v1`) instead of the Karpenter API (`karpenter.sh/v1` / `karpenter.k8s.aws/v1`).
  </Tab>
</Tabs>

## Placeholders

Code snippets in this guide use the following placeholders. Replace them with your values.

| Placeholder           | Value                                                                                                            |
| --------------------- | ---------------------------------------------------------------------------------------------------------------- |
| `REGION`              | AWS region (e.g., `us-east-1`)                                                                                   |
| `ACCOUNT_ID`          | Your 12-digit AWS account ID                                                                                     |
| `MY_DEPLOYMENT`       | A name for your deployment, used as a Secrets Manager path prefix and capacity block tag (e.g., `prod-adaptive`) |
| `MY_CLUSTER`          | Your EKS cluster name                                                                                            |
| `MY_HOSTNAME`         | Your Adaptive Engine domain (e.g., `adaptive.example.com`)                                                       |
| `MY_EKS_NODE_ROLE`    | IAM role name for EKS worker nodes                                                                               |
| `MY_BUCKET`           | Your S3 bucket name                                                                                              |
| `ACM_CERTIFICATE_ARN` | ARN of your ACM TLS certificate                                                                                  |

## Helm configuration for EKS

Start from the base [values.yaml](https://github.com/adaptive-ml/adaptive-helm-chart/blob/main/charts/adaptive/values.yaml) and apply the overrides below. Deploy the control plane first without GPU compute pools to validate your infrastructure (OIDC, secrets, database, Redis) before purchasing capacity blocks.

### Container registry (Amazon ECR)

```yaml theme={null}
containerRegistry: ACCOUNT_ID.dkr.ecr.REGION.amazonaws.com

harmony:
  image:
    repository: adaptive/harmony
    tag: "vX.Y.Z"
    pullPolicy: Always

controlPlane:
  image:
    repository: adaptive/control-plane
    tag: "vX.Y.Z"
    pullPolicy: Always
```

Ensure EKS nodes have an IAM instance profile with `ecr:GetDownloadUrlForLayer`, `ecr:BatchGetImage`, and `ecr:GetAuthorizationToken` permissions on the registry.

### Control plane

```yaml theme={null}
controlPlane:
  replicaCount: 3
  rootUrl: "https://MY_HOSTNAME"
  resources:
    requests:
      cpu: 2
      memory: 8Gi
    limits:
      memory: 8Gi
  podDisruptionBudget:
    enabled: true
    minAvailable: 1
  extraEnvVars:
    NO_COLOR: "1"
```

### Initial deployment (control plane only)

Deploy without compute pools to verify the infrastructure works end to end:

```bash theme={null}
helm install adaptive oci://ghcr.io/adaptive-ml/adaptive \
  --values ./values.yaml \
  --namespace adaptive \
  --create-namespace
```

Once the control plane pods are running, confirm:

* You can log in via your OIDC provider
* Secrets are synced (check the Kubernetes secrets in the `adaptive` namespace)
* The control plane connects to RDS and Redis (check pod logs for connection errors)

After validation, proceed to [purchase capacity blocks](#capacity-block-reservation) and [add compute pools](#compute-pools-targeting-capacity-blocks).

## AWS Secrets Manager

Store all sensitive configuration (database credentials, S3 paths, Redis authentication token, OIDC providers, and the cookies secret) in AWS Secrets Manager. The [External Secrets Operator](https://external-secrets.io/) syncs these values into Kubernetes secrets.

### IAM policy

The External Secrets Operator needs an IAM role with permission to read from Secrets Manager. Create the role using [EKS Pod Identity](https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html) and attach the following policy:

```json theme={null}
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret"
      ],
      "Resource": [
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:rds!db-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/rds/connection-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/s3/storage-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/redis/auth-token-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/oidc_secret-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/cookies-secret-*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:ListSecrets"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "secretsmanager:BatchGetSecretValue"
      ],
      "Resource": [
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:rds!db-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/rds/connection-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/s3/storage-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/redis/auth-token-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/oidc_secret-*",
        "arn:aws:secretsmanager:REGION:ACCOUNT_ID:secret:MY_DEPLOYMENT/cookies-secret-*"
      ]
    }
  ]
}
```

Then associate the role with the `external-secrets` service account in the `external-secrets` namespace:

```bash theme={null}
aws eks create-pod-identity-association \
  --cluster-name MY_CLUSTER \
  --namespace external-secrets \
  --service-account external-secrets \
  --role-arn arn:aws:iam::ACCOUNT_ID:role/MY_CLUSTER-external-secrets-role
```

### Secret layout

Organize secrets under a deployment prefix in Secrets Manager:

| Secret path                      | Format | Contents                                                                                            |
| -------------------------------- | ------ | --------------------------------------------------------------------------------------------------- |
| `rds!db-<UUID>`                  | JSON   | `username`, `password` — RDS-managed, supports automatic rotation                                   |
| `MY_DEPLOYMENT/rds/connection`   | JSON   | `endpoint`, `database_name`                                                                         |
| `MY_DEPLOYMENT/s3/storage`       | JSON   | `model_registry` (e.g., `s3://my-bucket/model_registry`), `workdir` (e.g., `s3://my-bucket/shared`) |
| `MY_DEPLOYMENT/redis/auth-token` | JSON   | `url` (e.g., `redis://:AUTH_TOKEN@ELASTICACHE_ENDPOINT:6379`)                                       |
| `MY_DEPLOYMENT/oidc_secret`      | String | JSON array of OIDC provider configurations (see [Authentication](#authentication-cognito))          |
| `MY_DEPLOYMENT/cookies-secret`   | String | Random string, 64+ characters, used to sign session cookies                                         |

### ClusterSecretStore

Create a `ClusterSecretStore` so ExternalSecret resources in any namespace can pull from Secrets Manager:

```yaml theme={null}
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: aws-secrets-manager
spec:
  provider:
    aws:
      service: SecretsManager
      region: REGION
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa
            namespace: external-secrets
```

The `serviceAccountRef` must point to a service account bound to the IAM role created in the [IAM policy](#iam-policy) section above.

### Control plane ExternalSecret

This ExternalSecret assembles the control plane Kubernetes secret from multiple Secrets Manager entries. A `refreshInterval` of `3m` ensures rotated database passwords are picked up within minutes.

```yaml theme={null}
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: adaptive-controlplane
  namespace: adaptive
spec:
  refreshInterval: 3m
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: adaptive-controlplane
  data:
    - secretKey: dbUsername
      remoteRef:
        key: rds!db-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        property: username
    - secretKey: dbPassword
      remoteRef:
        key: rds!db-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
        property: password
    - secretKey: dbHost
      remoteRef:
        key: MY_DEPLOYMENT/rds/connection
        property: endpoint
    - secretKey: dbName
      remoteRef:
        key: MY_DEPLOYMENT/rds/connection
        property: database_name
    - secretKey: cookiesSecret
      remoteRef:
        key: MY_DEPLOYMENT/cookies-secret
    - secretKey: oidcProviders
      remoteRef:
        key: MY_DEPLOYMENT/oidc_secret
```

### Harmony ExternalSecret

S3 paths and Redis credentials change less frequently and refresh every hour:

```yaml theme={null}
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: adaptive-harmony
  namespace: adaptive
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: adaptive-harmony
  data:
    - secretKey: modelRegistryUrl
      remoteRef:
        key: MY_DEPLOYMENT/s3/storage
        property: model_registry
    - secretKey: sharedDirectoryUrl
      remoteRef:
        key: MY_DEPLOYMENT/s3/storage
        property: workdir
```

### Redis ExternalSecret

```yaml theme={null}
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: adaptive-redis
  namespace: adaptive
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: adaptive-redis
  data:
    - secretKey: redisUrl
      remoteRef:
        key: MY_DEPLOYMENT/redis/auth-token
        property: url
```

### RDS password rotation

The `rds!db-<UUID>` secret is managed by RDS and supports [automatic password rotation](https://docs.aws.amazon.com/secretsmanager/latest/userguide/rotate-db-creds.html). When rotation is enabled:

1. RDS rotates the database password on a schedule you configure (e.g., every 30 days).
2. RDS writes the new credentials to the `rds!db-<UUID>` secret in Secrets Manager.
3. The External Secrets Operator polls every 3 minutes (`refreshInterval: 3m`) and updates the Kubernetes secret.
4. Adaptive Engine picks up the new credentials on the next database connection attempt.

To ensure pods restart automatically when secrets change, deploy [Stakater Reloader](https://github.com/stakater/Reloader) and add the following annotation to the control plane deployment in your `values.yaml`:

```yaml theme={null}
controlPlane:
  annotations:
    reloader.stakater.com/auto: "true"
```

Reloader watches for changes to secrets referenced by the deployment and triggers a rolling restart when they update.

Enable rotation in the RDS console or via AWS CLI:

```bash theme={null}
aws secretsmanager rotate-secret \
  --secret-id "rds!db-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX" \
  --rotation-rules AutomaticallyAfterDays=30
```

<Warning>
  Use `refreshInterval: 3m` (or shorter) for database credentials. A longer interval risks the application using stale credentials after a rotation event.
</Warning>

### Updating secrets

To update a secret value (e.g., changing the S3 bucket or OIDC configuration):

```bash theme={null}
# Update a JSON secret
aws secretsmanager put-secret-value \
  --secret-id "MY_DEPLOYMENT/s3/storage" \
  --secret-string '{"model_registry": "s3://new-bucket/model_registry", "workdir": "s3://new-bucket/shared"}'

# Update a plain string secret (e.g., OIDC providers)
aws secretsmanager put-secret-value \
  --secret-id "MY_DEPLOYMENT/oidc_secret" \
  --secret-string '[{"name": "Cognito", "key": "cognito", "issuer_url": "https://cognito-idp.REGION.amazonaws.com/USER_POOL_ID", "client_id": "CLIENT_ID", "client_secret": "CLIENT_SECRET", "scopes": ["email", "profile", "openid"], "pkce": true, "allow_sign_up": true}]'
```

The External Secrets Operator syncs changes on the next refresh cycle. To [force an immediate sync](https://external-secrets.io/latest/introduction/faq/#can-i-manually-trigger-a-secret-refresh), annotate the ExternalSecret:

```bash theme={null}
kubectl annotate externalsecret EXTERNAL_SECRET_NAME \
  force-sync=$(date +%s) --overwrite -n adaptive
```

## Database (RDS)

Disable the in-chart PostgreSQL. Database credentials are sourced from Secrets Manager via the [control plane ExternalSecret](#control-plane-externalsecret). Do not place credentials in `values.yaml`.

**Recommended RDS settings:**

| Setting                  | Value                                                                             |
| ------------------------ | --------------------------------------------------------------------------------- |
| Engine                   | PostgreSQL 17+                                                                    |
| Instance class           | db.m8g.2xlarge (or larger for high-throughput workloads)                          |
| Multi-AZ                 | Enabled for production                                                            |
| Storage                  | gp3 with encryption enabled                                                       |
| Backup retention         | 30 days                                                                           |
| Cross-region replication | Recommended for disaster recovery                                                 |
| Password rotation        | Enabled via Secrets Manager (see [RDS password rotation](#rds-password-rotation)) |

See [database TLS configuration](/v0.13/deploy/self-hosting#database-tls) for certificate verification setup.

## Cache (Amazon ElastiCache)

Disable the in-chart Redis. The ElastiCache endpoint and authentication token come from Secrets Manager via the [Redis ExternalSecret](#redis-externalsecret).

```yaml theme={null}
redis:
  enabled: false
```

**Recommended ElastiCache settings:**

| Setting               | Value                              |
| --------------------- | ---------------------------------- |
| Engine                | Redis 7                            |
| Node type             | cache.r7g.large (or larger)        |
| Encryption in transit | Enabled                            |
| Authentication token  | Enabled, stored in Secrets Manager |

## Object storage (Amazon S3)

S3 bucket paths are sourced from Secrets Manager via the [Harmony ExternalSecret](#harmony-externalsecret). Grant the Harmony and control plane service accounts access to the bucket via [IAM Roles for Service Accounts (IRSA)](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) or [EKS Pod Identity](https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html).

The pods require the following S3 permissions on the bucket and its contents. Multipart upload actions are needed because Harmony uses multipart uploads for large model files.

```json theme={null}
{
  "Effect": "Allow",
  "Action": [
    "s3:AbortMultipartUpload",
    "s3:CompleteMultipartUpload",
    "s3:CreateMultipartUpload",
    "s3:DeleteObject",
    "s3:GetBucketLocation",
    "s3:GetObject",
    "s3:GetObjectTagging",
    "s3:ListBucket",
    "s3:ListBucketMultipartUploads",
    "s3:ListMultipartUploadParts",
    "s3:PutObject",
    "s3:PutObjectTagging",
    "s3:UploadPart"
  ],
  "Resource": [
    "arn:aws:s3:::MY_BUCKET",
    "arn:aws:s3:::MY_BUCKET/*"
  ]
}
```

## Logging (Amazon CloudWatch)

Forward logs from the control plane and Harmony pods to CloudWatch using the [CloudWatch Observability add-on](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Container-Insights-setup-EKS-addon.html) or a standalone [CloudWatch agent](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/install-CloudWatch-Observability-EKS-addon.html) deployed as a DaemonSet.

The Adaptive Helm chart includes a built-in OpenTelemetry Collector (enabled by default). Configure it to export logs to your CloudWatch agent:

```yaml theme={null}
otelCollector:
  enabled: true
  exporters:
    otlphttp/cw:
      endpoint: "CLOUDWATCH_OTLP_ENDPOINT"
  pipelines:
    logs:
      exporters:
        - otlphttp/cw
```

Replace `CLOUDWATCH_OTLP_ENDPOINT` with your CloudWatch agent's OTLP endpoint (e.g., `http://cloudwatch-agent.amazon-cloudwatch.svc.cluster.local:4318`).

Set a log retention policy that meets your compliance requirements (e.g., 400 days).

## Ingress (ALB)

Configure the [AWS Load Balancer Controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/latest/) ingress:

```yaml theme={null}
ingress:
  enabled: true
  className: alb
  hostname: MY_HOSTNAME
  tls:
    enabled: true
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/certificate-arn: ACM_CERTIFICATE_ARN
    alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-2-2021-06
```

<Accordion title="Restrict ingress to specific IP ranges">
  If you front the ALB with a CDN or VPN, restrict inbound traffic using security group prefix lists:

  ```yaml theme={null}
  ingress:
    annotations:
      alb.ingress.kubernetes.io/security-group-prefixes: "pl-XXXXXXXXXXXXXXXX,pl-YYYYYYYYYYYYYYYY"
  ```
</Accordion>

## Authentication (Amazon Cognito)

If you use Amazon Cognito as your OIDC provider, store the provider configuration in Secrets Manager at `MY_DEPLOYMENT/oidc_secret` as a JSON array:

```json theme={null}
[
  {
    "name": "Cognito",
    "key": "cognito",
    "issuer_url": "https://cognito-idp.REGION.amazonaws.com/USER_POOL_ID",
    "client_id": "COGNITO_CLIENT_ID",
    "client_secret": "COGNITO_CLIENT_SECRET",
    "scopes": ["email", "profile", "openid"],
    "pkce": true,
    "allow_sign_up": true
  }
]
```

The use of Amazon Cognito is optional, you can use an alternative OIDC client, such as Azure Entra ID, Okta, Google, Keycloak.

Configure auth settings in `values.yaml`:

```yaml theme={null}
auth:
  default_role: read-only
  session:
    secure: true
    expiration_seconds: 86400
  admins:
    - "admin@example.com"
```

<Warning>
  If `allow_sign_up` is `true`, any user in your Cognito user pool can access Adaptive Engine. Set to `false` and create users via SDK to restrict access.
</Warning>

## AMI requirements

EKS nodes that run GPU workloads must use an AMI with the NVIDIA drivers pre-installed. Use one of:

* **EKS-optimized accelerated AMI** — the default for GPU node groups managed by EKS Auto or Karpenter. Includes NVIDIA drivers and the `nvidia-container-toolkit`. This is the recommended option.
* **Custom AMI** — required only if you need a specific driver version or kernel configuration. Build from the [EKS-optimized AMI](https://docs.aws.amazon.com/eks/latest/userguide/eks-optimized-ami.html) and ensure CUDA 12.8+ with driver 570.172.08+.

<Warning>
  If you use Karpenter, set the AMI family in your `EC2NodeClass` to `AL2023` or `Bottlerocket`. Karpenter then selects the correct accelerated AMI automatically. Do not override the AMI ID unless you maintain a custom image pipeline.
</Warning>

## Capacity block reservation

### Book a capacity block

Purchase capacity blocks through the [AWS Console](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capacity-blocks.html) or CLI. Each reservation returns a **Capacity Reservation ID** (`cr-XXXXXXXXXXXXXXXXX`).

Capacity block availability varies by region, instance type, and duration. There may be no offerings for your target configuration at any given time — check availability before attempting a purchase.

**Search available offerings:**

```bash theme={null}
aws ec2 describe-capacity-block-offerings \
  --instance-type p5.48xlarge \
  --instance-count 1 \
  --capacity-duration-hours 720
```

This returns a list of offerings with prices, start times, and an `CapacityBlockOfferingId` for each. Adjust `--capacity-duration-hours` (24–4320) and `--instance-count` to find available slots. If the response is empty, try a different duration or instance type.

**Purchase an offering:**

```bash theme={null}
aws ec2 purchase-capacity-block \
  --instance-type p5.48xlarge \
  --instance-count 1 \
  --instance-platform Linux/UNIX \
  --capacity-block-offering-id OFFERING_ID \
  --tag-specifications 'ResourceType=capacity-reservation,Tags=[{Key=deployment,Value=MY_DEPLOYMENT}]'
```

Replace `OFFERING_ID` with a `CapacityBlockOfferingId` from the search results. Tag reservations with a consistent key (e.g., `deployment: MY_DEPLOYMENT`) so Karpenter can discover them automatically.

### Karpenter integration

Create a dedicated `EC2NodeClass` and `NodePool` that target your capacity block reservations.

**EC2NodeClass** — defines the launch template for capacity block nodes:

```yaml theme={null}
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu-capacity-block
spec:
  role: MY_EKS_NODE_ROLE
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: MY_CLUSTER
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: MY_CLUSTER
  capacityReservationSelectorTerms:
    - tags:
        deployment: MY_DEPLOYMENT
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 200Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        encrypted: true
```

**NodePool** — schedules only onto reserved capacity:

```yaml theme={null}
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu-capacity-block
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu-capacity-block
      taints:
        - key: nvidia.com/gpu
          effect: NoSchedule
        - key: adaptive.com/gpu-capacity-block
          effect: NoSchedule
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["reserved"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
      terminationGracePeriod: 24h
  disruption:
    budgets:
      - nodes: "0"
    consolidationPolicy: WhenEmpty
    consolidateAfter: 1h
```

<Accordion title="On-demand fallback node pool">
  For workloads that can overflow to on-demand instances when capacity block nodes are full, create a second node pool:

  ```yaml theme={null}
  apiVersion: karpenter.sh/v1
  kind: NodePool
  metadata:
    name: gpu-on-demand
  spec:
    template:
      spec:
        nodeClassRef:
          group: karpenter.k8s.aws
          kind: EC2NodeClass
          name: gpu-default
        taints:
          - key: nvidia.com/gpu
            effect: NoSchedule
          - key: adaptive.com/on-demand
            effect: NoSchedule
        requirements:
          - key: karpenter.sh/capacity-type
            operator: In
            values: ["on-demand"]
          - key: node.kubernetes.io/instance-type
            operator: In
            values: ["p5.48xlarge", "p5e.48xlarge"]
          - key: kubernetes.io/arch
            operator: In
            values: ["amd64"]
    disruption:
      budgets:
        - nodes: "10%"
      consolidationPolicy: WhenEmpty
      consolidateAfter: 30m
  ```
</Accordion>

## Compute pools targeting capacity blocks

Each compute pool maps to a set of GPU replicas. Use `nodeSelector` to pin pools to specific capacity block reservations. Add these to your `values.yaml` and run `helm upgrade`:

```yaml theme={null}
harmony:
  image:
    repository: adaptive/harmony
    tag: "vX.Y.Z"
    pullPolicy: Always
  extraEnvVars:
    HARMONY_SETTING_ALLOW_NCCL_TP: "1"
    NCCL_IGNORE_DISABLED_P2P: "1"
  computePool:
    - name: inference-pool-a
      gpusPerReplica: 2
      replicaCount: 1
      nodeSelector:
        eks.amazonaws.com/capacity-reservation-id: cr-XXXXXXXXXXXXXXXXX
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
        - key: adaptive.com/gpu-capacity-block
          operator: Exists
          effect: NoSchedule
      resources:
        requests:
          cpu: 45
          memory: 425Gi
        limits:
          cpu: 45
          memory: 425Gi

    - name: training-pool
      gpusPerReplica: 8
      replicaCount: 1
      nodeSelector:
        eks.amazonaws.com/capacity-reservation-id: cr-YYYYYYYYYYYYYYYYY
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
        - key: adaptive.com/gpu-capacity-block
          operator: Exists
          effect: NoSchedule
      resources:
        requests:
          cpu: 180
          memory: 1700Gi
        limits:
          cpu: 180
          memory: 1700Gi
```

<Accordion title="EFA for multi-node communication">
  If your Adaptive Engine cluster has multi-node compute pools, you must set Elastic Fabric Adapter (EFA):

  * add EFA device requests to your pod
  * Use the `harmony-efa` image variant (provided by Adaptive) which includes the EFA libraries.

  ```yaml theme={null}
  harmony:
    extraEnvVars:
      HARMONY_SETTING_ALLOW_NCCL_TP: "1"
      NCCL_IGNORE_DISABLED_P2P: "1"
      NCCL_PROTO: "LL,LL128,Simple"
    computePool:
      - name: efa-training-pool
        gpusPerReplica: 8
        replicaCount: 1
        image:
          repository: adaptive/harmony-efa
          tag: "vX.Y.Z"
        nodeSelector:
          eks.amazonaws.com/capacity-reservation-id: cr-YYYYYYYYYYYYYYYYY
        tolerations:
          - key: nvidia.com/gpu
            operator: Exists
            effect: NoSchedule
          - key: adaptive.com/gpu-capacity-block
            operator: Exists
            effect: NoSchedule
        resources:
          requests:
            cpu: 180
            memory: 1700Gi
            vpc.amazonaws.com/efa: 32
          limits:
            cpu: 180
            memory: 1700Gi
            vpc.amazonaws.com/efa: 32
  ```
</Accordion>

<Info>
  Keep \~10% vCPU and memory for kubelet and system pods. For example, on p5.48xlarge (192 vCPUs, 2048 GiB), request 180 CPU and 1700GiB per 8-GPU replica.
</Info>

## Capacity block lifecycle

Capacity blocks have fixed start and end times. Plan for reservation renewals:

* **Extend before rebooking** — Capacity blocks can be extended if capacity allows. Check whether an existing block can be extended before purchasing a new one.
* **Before expiry** — Purchase the next capacity block and update the `deployment` tag (or reservation ID in `nodeSelector`) to include the new reservation.
* **Zero-downtime transition** — To avoid downtime when a capacity block expires, create a new [compute pool](/v0.13/deploy/architecture#compute-pools) backed by the replacement block and load-balance inference across both pools until the old block expires.

<Warning>
  If a capacity block expires without a replacement, GPU pods enter `Pending` state. Maintain an on-demand fallback node pool (with `replicaCount: 0` in Helm) that can be scaled up as a safety net.
</Warning>
