This guide covers a production Nebula install on AKS using Azure-managed Postgres, Azure Blob Storage (S3-compatible endpoint), and Azure Key Vault secrets via External Secrets Operator.Documentation Index
Fetch the complete documentation index at: https://docs.trynebula.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prereqs
Beforehelm install, the following must be in place on the cluster side.
Cluster
- AKS 1.30+
- OIDC issuer enabled on the cluster (
az aks update --enable-oidc-issuer --enable-workload-identity) — required for Workload Identity federation - Cluster nodes must have outbound internet access, or images must be mirrored to Azure Container Registry (ACR) first
Addons + controllers
| Component | Purpose | Install reference |
|---|---|---|
| Cluster Autoscaler (or Karpenter for Azure preview) | Node autoscaling | AKS addon: --enable-cluster-autoscaler |
| nginx Ingress Controller (or AGIC) | HTTP/HTTPS ingress | kubernetes.github.io/ingress-nginx |
| Azure Disk CSI Driver | Premium SSD volumes for graph-engine / compactor / RabbitMQ | AKS built-in: enabled by default on AKS 1.21+ |
| cert-manager | TLS certificate provisioning from Let’s Encrypt | cert-manager.io/docs |
| External Secrets Operator (recommended) | Sync from Azure Key Vault | external-secrets.io |
Azure-managed resources (recommended)
- Azure Database for PostgreSQL Flexible Server in the same virtual network as the AKS cluster. Enable the
vectorextension: in the Azure portal, navigate to Server parameters →azure.extensions→ addvector. Private access (VNet-integrated) is strongly recommended. - Azure Blob Storage account with a container for graph segments. The chart’s object storage path uses Azure Blob’s S3-compatible API endpoint — see the note under Object storage below.
objectStorage block emits S3-protocol environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_ENDPOINT_URL). Azure Blob exposes an S3-compatible endpoint (Storage account → Settings → S3 compatibility, currently in preview). Enable it and use HMAC access keys as the credentialsSecret. If the S3-compat preview is not available in your region or subscription tier, run a MinIO gateway in front of Azure Blob as a bridge.
Workload Identity setup
Workload Identity replaces the legacy aad-pod-identity approach. Steps:-
Create a managed identity in the same resource group as the cluster:
-
Federate the managed identity with the AKS OIDC issuer for the Nebula service account:
Replace
<release>with yourhelm installrelease name (e.g.nebula). - Grant the managed identity Storage Blob Data Contributor on the Blob container and Key Vault Secrets User on the Key Vault if ESO uses the same identity.
-
Record the managed identity Client ID — you’ll set it under
serviceAccount.annotationsin your values file.
Install
1. Push images to your ACR
image.*.repository in your values file:
2. Seed secrets in Azure Key Vault
Create a Key Vault and store one secret per Nebula key, or store a JSON blob at a single secret name and use ESO’sdataFrom extraction. Example using individual secrets:
username and password keys) by ESO before install.
3. Copy + fill the reference values file
The bundle shipshelm/examples/aks/values.yaml with every AKS-specific knob pre-wired. Copy it, fill in the <placeholder> markers (ACR login server, Flexible Server hostname, Blob storage account, managed identity client ID, Key Vault name, domain), and save as your-values.yaml.
4. Install
_common/production-sizing.yaml is the shared production-shape sizing block (replicas, CPU/memory requests + limits, persistence) used by all three cloud-managed K8s examples (EKS/AKS/GKE). Omit it to keep the chart’s minimal-dev defaults; override per-workload in your-values.yaml to fit your AKS node SKUs.
The chart runs schema migrations and catalog-apply automatically via a per-revision Job (<release>-nebula-migrations-<revision>); API and worker pods gate startup on an init container that polls public.nebula_release_contract for the install’s release row. releaseContract.releaseId and releaseContract.gitSha are stamped into the bundled values by bundle.sh and are consumed automatically.
5. Verify
Upgrade
Pull the new bundle, push new images to your ACR, then:Sizing reference
| Workload | Starter | When to scale |
|---|---|---|
| API | 2 replicas, 1 CPU / 2-4 GB | HPA on CPU >70% sustained |
| Worker | 2 replicas, 2 CPU / 4-8 GB | HPA on queue depth (Hatchet metric) |
| Graph engine | 2 replicas, 2 CPU / 4-8 GB | Manual; restart-sensitive (WAL replay) |
| Compactor | 1 replica, 1 CPU / 2-4 GB | Single-writer; do not scale horizontally |
| RabbitMQ | 1 replica, 8 GB PVC | Single-broker is fine up to ~10k workflows/min |
Standard_D4s_v5 (4 vCPU / 16 GB) for API, worker, and Hatchet; Standard_D8s_v5 (8 vCPU / 32 GB) for graph-engine and compactor.
Troubleshooting
Workload Identity not bound — pods receive 401 from Azure APIs
Workload Identity not bound — pods receive 401 from Azure APIs
Check that the managed identity’s federated credential subject exactly matches
system:serviceaccount:<namespace>:<release>-nebula-sa. The release name prefix is part of the service account name. Confirm with kubectl -n nebula get sa and compare to az identity federated-credential list --identity-name nebula-wi --resource-group <rg>.Ingress provisioning slow or stuck
Ingress provisioning slow or stuck
nginx Ingress on AKS provisions a public Azure Load Balancer automatically. The provisioning can take 3-5 minutes on a fresh cluster. Check
kubectl -n ingress-nginx get svc ingress-nginx-controller for the external IP assignment. If it stays in Pending, verify that the cluster’s subnet has enough IP space and that the AKS service principal / managed identity has Network Contributor on the virtual network.pgvector missing on first start — API reports 'extension vector does not exist'
pgvector missing on first start — API reports 'extension vector does not exist'
The
azure.extensions server parameter must include vector before the database is created. If the database already exists without the extension, connect directly and run CREATE EXTENSION IF NOT EXISTS vector;. The extension must be enabled in the parameter group AND the database.Blob credentials rejected — graph-engine 'InvalidAccessKeyId'
Blob credentials rejected — graph-engine 'InvalidAccessKeyId'
Azure Blob’s S3-compatible endpoint requires HMAC keys, not the storage account connection string. Generate HMAC keys under Storage account → Access keys → Enable S3 compatible HMAC. Store the Access Key ID and Secret Access Key in the Kubernetes Secret referenced by
objectStorage.credentialsSecret with keys AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (those exact uppercase names — the chart’s nebula.objectStorageEnv helper reads them via secretKeyRef.key).