Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.trynebula.ai/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers a production Nebula install on AKS using Azure-managed Postgres, Azure Blob Storage (S3-compatible endpoint), and Azure Key Vault secrets via External Secrets Operator.

Prereqs

Before helm install, the following must be in place on the cluster side.

Cluster

  • AKS 1.30+
  • OIDC issuer enabled on the cluster (az aks update --enable-oidc-issuer --enable-workload-identity) — required for Workload Identity federation
  • Cluster nodes must have outbound internet access, or images must be mirrored to Azure Container Registry (ACR) first

Addons + controllers

ComponentPurposeInstall reference
Cluster Autoscaler (or Karpenter for Azure preview)Node autoscalingAKS addon: --enable-cluster-autoscaler
nginx Ingress Controller (or AGIC)HTTP/HTTPS ingresskubernetes.github.io/ingress-nginx
Azure Disk CSI DriverPremium SSD volumes for graph-engine / compactor / RabbitMQAKS built-in: enabled by default on AKS 1.21+
cert-managerTLS certificate provisioning from Let’s Encryptcert-manager.io/docs
External Secrets Operator (recommended)Sync from Azure Key Vaultexternal-secrets.io
  • Azure Database for PostgreSQL Flexible Server in the same virtual network as the AKS cluster. Enable the vector extension: in the Azure portal, navigate to Server parametersazure.extensions → add vector. Private access (VNet-integrated) is strongly recommended.
  • Azure Blob Storage account with a container for graph segments. The chart’s object storage path uses Azure Blob’s S3-compatible API endpoint — see the note under Object storage below.
Known limitation: the chart’s objectStorage block emits S3-protocol environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_ENDPOINT_URL). Azure Blob exposes an S3-compatible endpoint (Storage account → Settings → S3 compatibility, currently in preview). Enable it and use HMAC access keys as the credentialsSecret. If the S3-compat preview is not available in your region or subscription tier, run a MinIO gateway in front of Azure Blob as a bridge.

Workload Identity setup

Workload Identity replaces the legacy aad-pod-identity approach. Steps:
  1. Create a managed identity in the same resource group as the cluster:
    az identity create --name nebula-wi --resource-group <rg>
    
  2. Federate the managed identity with the AKS OIDC issuer for the Nebula service account:
    AKS_OIDC_ISSUER="$(az aks show --name <cluster> --resource-group <rg> \
      --query 'oidcIssuerProfile.issuerUrl' -o tsv)"
    az identity federated-credential create \
      --name nebula-federated \
      --identity-name nebula-wi \
      --resource-group <rg> \
      --issuer "$AKS_OIDC_ISSUER" \
      --subject "system:serviceaccount:nebula:<release>-nebula-sa" \
      --audience api://AzureADTokenExchange
    
    Replace <release> with your helm install release name (e.g. nebula).
  3. Grant the managed identity Storage Blob Data Contributor on the Blob container and Key Vault Secrets User on the Key Vault if ESO uses the same identity.
  4. Record the managed identity Client ID — you’ll set it under serviceAccount.annotations in your values file.

Install

1. Push images to your ACR

tar -xzf nebula-enterprise-<version>.tar.gz
cd nebula-enterprise-<version>/
sha256sum -c checksums.txt
docker load -i images.tar

ACR=<your-registry>.azurecr.io
az acr login --name <your-registry>

docker tag nebula:enterprise-<version>              "${ACR}/nebula/nebula-runtime:<version>"
docker tag nebula-graph-engine:enterprise-<version> "${ACR}/nebula/graph-engine:<version>"
docker push "${ACR}/nebula/nebula-runtime:<version>"
docker push "${ACR}/nebula/graph-engine:<version>"
For air-gapped AKS (no public-registry egress), also mirror third-party images to ACR and override image.*.repository in your values file:
docker tag ghcr.io/hatchet-dev/hatchet/hatchet-engine:v0.79.0 "${ACR}/hatchet-engine:v0.79.0"
docker tag pgvector/pgvector:0.8.0-pg16                       "${ACR}/pgvector/pgvector:0.8.0-pg16"
docker tag rabbitmq:3.13-management                           "${ACR}/rabbitmq:3.13-management"
docker tag busybox:1.37.0                                     "${ACR}/busybox:1.37.0"
docker push "${ACR}/hatchet-engine:v0.79.0"
docker push "${ACR}/pgvector/pgvector:0.8.0-pg16"
docker push "${ACR}/rabbitmq:3.13-management"
docker push "${ACR}/busybox:1.37.0"

2. Seed secrets in Azure Key Vault

Create a Key Vault and store one secret per Nebula key, or store a JSON blob at a single secret name and use ESO’s dataFrom extraction. Example using individual secrets:
az keyvault secret set --vault-name <kv> --name OPENAI-API-KEY      --value "sk-..."
az keyvault secret set --vault-name <kv> --name NEBULA-SECRET-KEY   --value "$(openssl rand -hex 32)"
az keyvault secret set --vault-name <kv> --name NEBULA-SERVICE-API-KEY --value "$(openssl rand -hex 32)"
az keyvault secret set --vault-name <kv> --name NEBULA-WEBHOOK-HMAC-SECRET --value "$(openssl rand -hex 32)"
az keyvault secret set --vault-name <kv> --name NEBULA-INTERNAL-WAKE-TOKEN --value "$(openssl rand -hex 32)"
az keyvault secret set --vault-name <kv> --name NEBULA-VECTOR-BUILD-HATCHET-TRIGGER-TOKEN --value "$(openssl rand -hex 32)"
Postgres credentials for each database go in separate Key Vault secrets and are materialized into Kubernetes Secrets (with username and password keys) by ESO before install.

3. Copy + fill the reference values file

The bundle ships helm/examples/aks/values.yaml with every AKS-specific knob pre-wired. Copy it, fill in the <placeholder> markers (ACR login server, Flexible Server hostname, Blob storage account, managed identity client ID, Key Vault name, domain), and save as your-values.yaml.

4. Install

helm install nebula ./helm/nebula-<version>.tgz \
  -n nebula --create-namespace \
  -f helm/examples/_common/production-sizing.yaml \
  -f your-values.yaml
_common/production-sizing.yaml is the shared production-shape sizing block (replicas, CPU/memory requests + limits, persistence) used by all three cloud-managed K8s examples (EKS/AKS/GKE). Omit it to keep the chart’s minimal-dev defaults; override per-workload in your-values.yaml to fit your AKS node SKUs. The chart runs schema migrations and catalog-apply automatically via a per-revision Job (<release>-nebula-migrations-<revision>); API and worker pods gate startup on an init container that polls public.nebula_release_contract for the install’s release row. releaseContract.releaseId and releaseContract.gitSha are stamped into the bundled values by bundle.sh and are consumed automatically.

5. Verify

az aks get-credentials --name <cluster> --resource-group <rg>
kubectl -n nebula get pods
kubectl -n nebula get ingress nebula
curl -fsS https://nebula.<your-domain>.com/v1/health

Upgrade

Pull the new bundle, push new images to your ACR, then:
helm upgrade nebula ./helm/nebula-<new-version>.tgz \
  -n nebula \
  -f your-values.yaml

Sizing reference

WorkloadStarterWhen to scale
API2 replicas, 1 CPU / 2-4 GBHPA on CPU >70% sustained
Worker2 replicas, 2 CPU / 4-8 GBHPA on queue depth (Hatchet metric)
Graph engine2 replicas, 2 CPU / 4-8 GBManual; restart-sensitive (WAL replay)
Compactor1 replica, 1 CPU / 2-4 GBSingle-writer; do not scale horizontally
RabbitMQ1 replica, 8 GB PVCSingle-broker is fine up to ~10k workflows/min
Recommended AKS node SKUs for the starter shape: Standard_D4s_v5 (4 vCPU / 16 GB) for API, worker, and Hatchet; Standard_D8s_v5 (8 vCPU / 32 GB) for graph-engine and compactor.

Troubleshooting

Check that the managed identity’s federated credential subject exactly matches system:serviceaccount:<namespace>:<release>-nebula-sa. The release name prefix is part of the service account name. Confirm with kubectl -n nebula get sa and compare to az identity federated-credential list --identity-name nebula-wi --resource-group <rg>.
nginx Ingress on AKS provisions a public Azure Load Balancer automatically. The provisioning can take 3-5 minutes on a fresh cluster. Check kubectl -n ingress-nginx get svc ingress-nginx-controller for the external IP assignment. If it stays in Pending, verify that the cluster’s subnet has enough IP space and that the AKS service principal / managed identity has Network Contributor on the virtual network.
The azure.extensions server parameter must include vector before the database is created. If the database already exists without the extension, connect directly and run CREATE EXTENSION IF NOT EXISTS vector;. The extension must be enabled in the parameter group AND the database.
Azure Blob’s S3-compatible endpoint requires HMAC keys, not the storage account connection string. Generate HMAC keys under Storage accountAccess keysEnable S3 compatible HMAC. Store the Access Key ID and Secret Access Key in the Kubernetes Secret referenced by objectStorage.credentialsSecret with keys AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (those exact uppercase names — the chart’s nebula.objectStorageEnv helper reads them via secretKeyRef.key).