Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.trynebula.ai/llms.txt

Use this file to discover all available pages before exploring further.

This is the recommended production deploy for any customer with a real AWS footprint. The Helm chart is the same artifact we run our own staging and production on, and EKS + Karpenter is the deploy shape our release pipeline is tuned for.

Prereqs

Before helm install, the following must be in place on the cluster side. If you’ve never set these up on an EKS cluster, budget half a day; each is well-documented upstream.

Cluster

  • EKS 1.30+ (matches what we run internally)
  • OIDC provider associated with the cluster (eksctl utils associate-iam-oidc-provider --cluster <name> --approve) — required for IRSA

Addons + controllers

ComponentPurposeInstall reference
KarpenterNode autoscalingkarpenter.sh/docs
AWS Load Balancer ControllerALB ingressaws-load-balancer-controller
EBS CSI Drivergp3 volumes for graph-engine / compactor / RabbitMQEKS addon: aws-ebs-csi-driver
External Secrets Operator (recommended)Sync from AWS Secrets Managerexternal-secrets.io
Karpenter needs a NodePool covering the instance families Nebula will run on. Our staging clusters use m6i, m7i, and c7i families; production also includes r7i for the graph-engine memory profile. The chart’s resource requests on the example values file fit comfortably in m7i.large and up.
  • RDS Postgres 16 in the same VPC as the cluster, with the cluster’s node security group allowed inbound on :5432. Enable rds.extensions = vector in the parameter group so pgvector is available.
  • S3 bucket in the same region as the cluster. Versioning + SSE-S3 (or SSE-KMS) recommended.
See Managed AWS resources for the IAM policy + parameter-group settings.

IAM role for IRSA

Create one IAM role with the cluster’s OIDC provider in its trust policy, scoped to the chart’s ServiceAccount (nebula-sa in the install namespace when you helm install nebula …; if you pick a different release name, the SA is <release>-nebula-sa — confirm with kubectl -n <ns> get sa after install). Attach an inline policy granting:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::<your-bucket>",
        "arn:aws:s3:::<your-bucket>/*"
      ]
    }
  ]
}
Reference the role ARN under serviceAccount.annotations.eks.amazonaws.com/role-arn in your values file.

Install

1. Push images to your ECR

The bundle’s images.tar contains every pinned image. The push paths below match the Helm chart’s default image.*.repository values — when image.registry in your values file is set to your ECR URI, the chart pulls exactly the refs you push here. Side-load, retag, push:
tar -xzf nebula-enterprise-<version>.tar.gz
cd nebula-enterprise-<version>/
sha256sum -c checksums.txt
docker load -i images.tar

ECR=<account-id>.dkr.ecr.us-east-1.amazonaws.com
aws ecr get-login-password --region us-east-1 \
  | docker login --username AWS --password-stdin "${ECR}"

# Nebula-built images. The chart's default image.*.repository values are
# `nebula/nebula-runtime` and `nebula/graph-engine`, and the tag defaults
# to .Chart.AppVersion (the bundle version, without the `enterprise-`
# prefix). So `<version>` below is e.g. `0.2.0`, not `enterprise-0.2.0`.
docker tag nebula:enterprise-<version>              "${ECR}/nebula/nebula-runtime:<version>"
docker tag nebula-graph-engine:enterprise-<version> "${ECR}/nebula/graph-engine:<version>"
docker push "${ECR}/nebula/nebula-runtime:<version>"
docker push "${ECR}/nebula/graph-engine:<version>"
Third-party images (Hatchet, pgvector, RabbitMQ, busybox): if your EKS cluster has egress to public registries (Docker Hub / GHCR), you don’t need to push these — the chart pulls them from upstream and the image.registry prepend is skipped automatically for fully-qualified repos like ghcr.io/... and docker.io/.... For air-gapped EKS (no public-registry egress), mirror them into your ECR and override the matching image.*.repository keys in your values file. Recommended push paths (chosen so ECR repo names stay valid — ECR doesn’t accept ghcr.io as a segment):
docker tag ghcr.io/hatchet-dev/hatchet/hatchet-engine:v0.79.0 "${ECR}/hatchet-engine:v0.79.0"
docker tag pgvector/pgvector:0.8.0-pg16                       "${ECR}/pgvector/pgvector:0.8.0-pg16"
docker tag rabbitmq:3.13-management                           "${ECR}/rabbitmq:3.13-management"
docker tag busybox:1.37.0                                     "${ECR}/busybox:1.37.0"
docker push "${ECR}/hatchet-engine:v0.79.0"
docker push "${ECR}/pgvector/pgvector:0.8.0-pg16"
docker push "${ECR}/rabbitmq:3.13-management"
docker push "${ECR}/busybox:1.37.0"
Then in your values file, override the matching repositories so the chart renders ${ECR}/... instead of upstream public refs:
image:
  hatchetEngine:
    repository: hatchet-engine
  postgres:
    repository: pgvector/pgvector
  rabbitmq:
    repository: rabbitmq
  busybox:
    repository: busybox

2. Seed secrets in AWS Secrets Manager

If you’re using ESO (recommended), put one JSON blob at the path you’ll reference under secrets.esoAws.awsSecretPath:
{
  "OPENAI_API_KEY": "sk-...",
  "NEBULA_SECRET_KEY": "<random 32 bytes hex>",
  "NEBULA_SERVICE_API_KEY": "<random 32 bytes hex>",
  "NEBULA_WEBHOOK_HMAC_SECRET": "<random 32 bytes hex>",
  "NEBULA_JWT_PRIVATE_KEY_PEM": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----",
  "NEBULA_JWT_KID": "<stable per-deployment value>",
  "NEBULA_INTERNAL_WAKE_TOKEN": "<random 32 bytes hex>",
  "NEBULA_VECTOR_BUILD_HATCHET_TRIGGER_TOKEN": "<random 32 bytes hex>"
}
Postgres credentials live in separate Secrets Manager entries referenced by postgres.credentialsSecret and hatchetPostgres.credentialsSecret — each must materialize a Kubernetes Secret with username + password keys (those exact lowercase key names — the chart reads them via secretKeyRef.key: username / .key: password).

3. Copy + fill the reference values file

The bundle ships a reference values file at helm/examples/eks/values.yaml with every AWS-specific knob pre-wired (Karpenter, gp3, IRSA, ALB, RDS+S3, ESO). Copy it, fill in the <placeholder> markers (account ID, RDS endpoint, IRSA role ARN, ACM cert ARN, S3 bucket name), and save as your-values.yaml.

4. Install

helm install nebula ./helm/nebula-<version>.tgz \
  -n nebula --create-namespace \
  -f your-values.yaml
Current chart limitation: the Helm chart does not yet render the Compose bundle’s catalog-bootstrap job. For a fresh database, initialize Alembic migrations, pgvector, and the Nebula catalog contract before exposing the API pods. Do not treat helm install alone as a complete greenfield database bootstrap until the chart includes that Job.

5. Verify

kubectl -n nebula get pods
kubectl -n nebula get ingress nebula
# Once the ALB is provisioned:
curl -fsS https://nebula.<your-domain>.com/v1/health

Upgrade

Pull the new bundle, push new images to your ECR, then:
helm upgrade nebula ./helm/nebula-<new-version>.tgz \
  -n nebula \
  -f your-values.yaml
Rolling update; no downtime on the API tier once the target database has already been migrated for the bundle version.

Sizing reference

The example values file ships with production-shape defaults for a starter deployment. Scale from there based on measured throughput:
WorkloadStarterWhen to scale
API2 replicas, 1 CPU / 2-4 GBHPA on CPU >70% sustained
Worker2 replicas, 2 CPU / 4-8 GBHPA on queue depth (Hatchet metric)
Graph engine2 replicas, 2 CPU / 4-8 GBManual; restart-sensitive (WAL replay)
Compactor1 replica, 1 CPU / 2-4 GBSingle-writer; do not scale horizontally
RabbitMQ1 replica, 8 GB PVCSingle-broker is fine up to ~10k workflows/min

Karpenter + long-running pods

The example values file sets karpenter.enabled=true, which adds karpenter.sh/do-not-disrupt: "true" to the API, worker, graph-engine, compactor, and Hatchet engine pods. This prevents Karpenter consolidation or drift from killing pods mid-ingest, mid-graph-build, or mid-snapshot. Pods still drain on actual node lifecycle events (rolling update, manual kubectl drain). If you’re on Cluster Autoscaler instead of Karpenter, leave karpenter.enabled=false. CA respects PDBs by default and doesn’t need the annotation.

Troubleshooting

Your postgres.credentialsSecret may be missing or may not have the expected keys. The Secret must contain username and password (those exact lowercase key names — the chart reads them via secretKeyRef.key: username / .key: password). If you’re using ESO, check the ExternalSecret resource synced before the API and worker pods started.
Either (a) the IRSA role isn’t attached to the ServiceAccount, or (b) the role’s policy doesn’t include the bucket ARN. Check kubectl -n nebula describe sa nebula-sa for the eks.amazonaws.com/role-arn annotation, and trace the IAM policy attached to that role.
The AWS Load Balancer Controller takes 30-60s to provision the ALB on first install. Check kubectl -n kube-system logs deploy/aws-load-balancer-controller for any IAM permission errors on the controller’s IRSA role.
RDS doesn’t auto-enable extensions even if shared_preload_libraries includes them — rds.extensions = vector must be in the parameter group, and the database initialization path must run CREATE EXTENSION IF NOT EXISTS vector before the API handles traffic.