What are EC2 Capacity Blocks?
EC2 Capacity Blocks (“capacity blocks”) are region-specific, instance-specific reservations that last between 1 and 180 days. A reservation can start immediately (subject to availability) or at a future date, and spans one or more EC2 instances. See the capacity blocks documentation for precise reservation mechanics and pricing. The capacity block is a fairly unique GPU procurement mechanism, available only on AWS. Adaptive ML recommends capacity blocks with Adaptive Engine for the following reasons:- GPU availability — P-family instances have practically no on-demand capacity. Capacity blocks provide reserved GPU instances with predictable lead times.
- Cost — Capacity blocks are priced below on-demand rates, and in some cases below annual reservations.
- Elasticity — An Adaptive Engine cluster can vary its GPU count on a daily basis as capacity blocks are created or expire. This allows organizations to grow capacity with workload demand and reduce cost when GPUs are not needed.
Capacity blocks are not mandatory to use Adaptive Engine on AWS. For low-volume inference of small-sized models, g6 and g6e instances can be used,
and have satisfactory on-demand capacity.
Architecture overview
A production deployment on AWS uses the following services:| Component | AWS service |
|---|---|
| Kubernetes | Amazon EKS |
| GPU VMs | EC2 Capacity Blocks |
| Postgres database | Amazon RDS (PostgreSQL) |
| Redis datastore | Amazon ElastiCache for Redis |
| Model registry | Amazon S3 |
| Secrets | AWS Secrets Manager |
| Logging | Amazon CloudWatch |
| DNS/TLS | ACM + your DNS provider |
| Ingress | AWS Load Balancer Controller |
Deployment checklist
Here are the steps to deploy an Adaptive Engine cluster on Amazon EKS with EC2 Capacity Blocks. Subsequent sections provide details and code snippets.- Provision AWS infrastructure — Create VPC, EKS cluster, RDS, ElastiCache, and S3 bucket.
- Install cluster dependencies — Deploy the NVIDIA GPU Operator, AWS Load Balancer Controller, External Secrets Operator, and optionally the CloudWatch agent.
- Populate Secrets Manager — Store database connection details, S3 bucket paths, Redis authentication token, OIDC provider configuration, and cookies secret. Deploy the
ClusterSecretStoreandExternalSecretresources. - Configure Helm values and deploy the control plane — Customize
values.yamland install the Helm chart withreplicaCount: 0for Harmony (no GPUs yet). Login to your domain name and go through the OIDC authentication flow to verify that the control plane is online. - Purchase capacity blocks — Reserve GPU instances via the EC2 console or CLI. Tag them for Karpenter discovery.
- Configure Karpenter node pools — Deploy
EC2NodeClassandNodePoolresources targeting your capacity block. - Add compute pools and upgrade — Set Harmony compute pools in
values.yamlwith node selectors and tolerations pointing to your capacity block nodes. Runhelm upgrade. - Verify GPUs — Confirm Harmony pods are scheduled on capacity block nodes. Open the control plane UI and check that your GPUs appear in the compute pools section.
Prerequisites
Before starting, complete the general self-hosting prerequisites and ensure you have:- An AWS account with GPU quota for your target instance types
- The External Secrets Operator deployed in the cluster
- EKS with Karpenter
- EKS Auto Mode
- An EKS cluster (1.28+) with Karpenter installed
- The AWS Load Balancer Controller deployed in the cluster
- The NVIDIA GPU Operator deployed in the cluster
Placeholders
Code snippets in this guide use the following placeholders. Replace them with your values.| Placeholder | Value |
|---|---|
REGION | AWS region (e.g., us-east-1) |
ACCOUNT_ID | Your 12-digit AWS account ID |
MY_DEPLOYMENT | A name for your deployment, used as a Secrets Manager path prefix and capacity block tag (e.g., prod-adaptive) |
MY_CLUSTER | Your EKS cluster name |
MY_HOSTNAME | Your Adaptive Engine domain (e.g., adaptive.example.com) |
MY_EKS_NODE_ROLE | IAM role name for EKS worker nodes |
MY_BUCKET | Your S3 bucket name |
ACM_CERTIFICATE_ARN | ARN of your ACM TLS certificate |
Helm configuration for EKS
Start from the base values.yaml and apply the overrides below. Deploy the control plane first without GPU compute pools to validate your infrastructure (OIDC, secrets, database, Redis) before purchasing capacity blocks.Container registry (Amazon ECR)
ecr:GetDownloadUrlForLayer, ecr:BatchGetImage, and ecr:GetAuthorizationToken permissions on the registry.
Control plane
Initial deployment (control plane only)
Deploy without compute pools to verify the infrastructure works end to end:- You can log in via your OIDC provider
- Secrets are synced (check the Kubernetes secrets in the
adaptivenamespace) - The control plane connects to RDS and Redis (check pod logs for connection errors)
AWS Secrets Manager
Store all sensitive configuration (database credentials, S3 paths, Redis authentication token, OIDC providers, and the cookies secret) in AWS Secrets Manager. The External Secrets Operator syncs these values into Kubernetes secrets.IAM policy
The External Secrets Operator needs an IAM role with permission to read from Secrets Manager. Create the role using EKS Pod Identity and attach the following policy:external-secrets service account in the external-secrets namespace:
Secret layout
Organize secrets under a deployment prefix in Secrets Manager:| Secret path | Format | Contents |
|---|---|---|
rds!db-<UUID> | JSON | username, password — RDS-managed, supports automatic rotation |
MY_DEPLOYMENT/rds/connection | JSON | endpoint, database_name |
MY_DEPLOYMENT/s3/storage | JSON | model_registry (e.g., s3://my-bucket/model_registry), workdir (e.g., s3://my-bucket/shared) |
MY_DEPLOYMENT/redis/auth-token | JSON | url (e.g., redis://:AUTH_TOKEN@ELASTICACHE_ENDPOINT:6379) |
MY_DEPLOYMENT/oidc_secret | String | JSON array of OIDC provider configurations (see Authentication) |
MY_DEPLOYMENT/cookies-secret | String | Random string, 64+ characters, used to sign session cookies |
ClusterSecretStore
Create aClusterSecretStore so ExternalSecret resources in any namespace can pull from Secrets Manager:
serviceAccountRef must point to a service account bound to the IAM role created in the IAM policy section above.
Control plane ExternalSecret
This ExternalSecret assembles the control plane Kubernetes secret from multiple Secrets Manager entries. ArefreshInterval of 3m ensures rotated database passwords are picked up within minutes.
Harmony ExternalSecret
S3 paths and Redis credentials change less frequently and refresh every hour:Redis ExternalSecret
RDS password rotation
Therds!db-<UUID> secret is managed by RDS and supports automatic password rotation. When rotation is enabled:
- RDS rotates the database password on a schedule you configure (e.g., every 30 days).
- RDS writes the new credentials to the
rds!db-<UUID>secret in Secrets Manager. - The External Secrets Operator polls every 3 minutes (
refreshInterval: 3m) and updates the Kubernetes secret. - Adaptive Engine picks up the new credentials on the next database connection attempt.
values.yaml:
Updating secrets
To update a secret value (e.g., changing the S3 bucket or OIDC configuration):Database (RDS)
Disable the in-chart PostgreSQL. Database credentials are sourced from Secrets Manager via the control plane ExternalSecret. Do not place credentials invalues.yaml.
Recommended RDS settings:
| Setting | Value |
|---|---|
| Engine | PostgreSQL 17+ |
| Instance class | db.m8g.2xlarge (or larger for high-throughput workloads) |
| Multi-AZ | Enabled for production |
| Storage | gp3 with encryption enabled |
| Backup retention | 30 days |
| Cross-region replication | Recommended for disaster recovery |
| Password rotation | Enabled via Secrets Manager (see RDS password rotation) |
Cache (Amazon ElastiCache)
Disable the in-chart Redis. The ElastiCache endpoint and authentication token come from Secrets Manager via the Redis ExternalSecret.| Setting | Value |
|---|---|
| Engine | Redis 7 |
| Node type | cache.r7g.large (or larger) |
| Encryption in transit | Enabled |
| Authentication token | Enabled, stored in Secrets Manager |
Object storage (Amazon S3)
S3 bucket paths are sourced from Secrets Manager via the Harmony ExternalSecret. Grant the Harmony and control plane service accounts access to the bucket via IAM Roles for Service Accounts (IRSA) or EKS Pod Identity. The pods require the following S3 permissions on the bucket and its contents. Multipart upload actions are needed because Harmony uses multipart uploads for large model files.Logging (Amazon CloudWatch)
Forward logs from the control plane and Harmony pods to CloudWatch using the CloudWatch Observability add-on or a standalone CloudWatch agent deployed as a DaemonSet. The Adaptive Helm chart includes a built-in OpenTelemetry Collector (enabled by default). Configure it to export logs to your CloudWatch agent:CLOUDWATCH_OTLP_ENDPOINT with your CloudWatch agent’s OTLP endpoint (e.g., http://cloudwatch-agent.amazon-cloudwatch.svc.cluster.local:4318).
Set a log retention policy that meets your compliance requirements (e.g., 400 days).
Ingress (ALB)
Configure the AWS Load Balancer Controller ingress:Restrict ingress to specific IP ranges
Restrict ingress to specific IP ranges
If you front the ALB with a CDN or VPN, restrict inbound traffic using security group prefix lists:
Authentication (Amazon Cognito)
If you use Amazon Cognito as your OIDC provider, store the provider configuration in Secrets Manager atMY_DEPLOYMENT/oidc_secret as a JSON array:
values.yaml:
AMI requirements
EKS nodes that run GPU workloads must use an AMI with the NVIDIA drivers pre-installed. Use one of:- EKS-optimized accelerated AMI — the default for GPU node groups managed by EKS Auto or Karpenter. Includes NVIDIA drivers and the
nvidia-container-toolkit. This is the recommended option. - Custom AMI — required only if you need a specific driver version or kernel configuration. Build from the EKS-optimized AMI and ensure CUDA 12.8+ with driver 570.172.08+.
Capacity block reservation
Book a capacity block
Purchase capacity blocks through the AWS Console or CLI. Each reservation returns a Capacity Reservation ID (cr-XXXXXXXXXXXXXXXXX).
Capacity block availability varies by region, instance type, and duration. There may be no offerings for your target configuration at any given time — check availability before attempting a purchase.
Search available offerings:
CapacityBlockOfferingId for each. Adjust --capacity-duration-hours (24–4320) and --instance-count to find available slots. If the response is empty, try a different duration or instance type.
Purchase an offering:
OFFERING_ID with a CapacityBlockOfferingId from the search results. Tag reservations with a consistent key (e.g., deployment: MY_DEPLOYMENT) so Karpenter can discover them automatically.
Karpenter integration
Create a dedicatedEC2NodeClass and NodePool that target your capacity block reservations.
EC2NodeClass — defines the launch template for capacity block nodes:
On-demand fallback node pool
On-demand fallback node pool
For workloads that can overflow to on-demand instances when capacity block nodes are full, create a second node pool:
Compute pools targeting capacity blocks
Each compute pool maps to a set of GPU replicas. UsenodeSelector to pin pools to specific capacity block reservations. Add these to your values.yaml and run helm upgrade:
EFA for multi-node communication
EFA for multi-node communication
If your Adaptive Engine cluster has multi-node compute pools, you must set Elastic Fabric Adapter (EFA):
- add EFA device requests to your pod
- Use the
harmony-efaimage variant (provided by Adaptive) which includes the EFA libraries.
Keep ~10% vCPU and memory for kubelet and system pods. For example, on p5.48xlarge (192 vCPUs, 2048 GiB), request 180 CPU and 1700GiB per 8-GPU replica.
Capacity block lifecycle
Capacity blocks have fixed start and end times. Plan for reservation renewals:- Extend before rebooking — Capacity blocks can be extended if capacity allows. Check whether an existing block can be extended before purchasing a new one.
- Before expiry — Purchase the next capacity block and update the
deploymenttag (or reservation ID innodeSelector) to include the new reservation. - Zero-downtime transition — To avoid downtime when a capacity block expires, create a new compute pool backed by the replacement block and load-balance inference across both pools until the old block expires.

