This guide covers a production Nebula install on GKE using Cloud SQL for PostgreSQL, Google Cloud Storage (via HMAC keys or a MinIO bridge), GKE Workload Identity, and external secrets via External Secrets Operator with GCP Secret Manager.Documentation Index
Fetch the complete documentation index at: https://docs.trynebula.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prereqs
Beforehelm install, the following must be in place on the cluster side.
Cluster
- GKE 1.30+ (Autopilot or Standard mode both work; Standard gives more control over node pools)
- Workload Identity enabled on the cluster (
--workload-pool=<project>.svc.id.goog) — required for keyless SA binding to GCP IAM - OIDC provider is implicit on GKE when Workload Identity is enabled; no separate step needed
Addons + controllers
| Component | Purpose | Install reference |
|---|---|---|
| GKE Cluster Autoscaler | Node autoscaling | GKE built-in: --enable-autoscaling per node pool |
| nginx Ingress Controller (or GCE Ingress) | HTTP/HTTPS ingress | kubernetes.github.io/ingress-nginx |
| cert-manager | TLS from Let’s Encrypt | cert-manager.io/docs |
| External Secrets Operator (recommended) | Sync from GCP Secret Manager | external-secrets.io |
resources.requests precisely so Autopilot selects the right machine family.
GCP-managed resources (recommended)
- Cloud SQL for PostgreSQL 16 in the same region as the cluster, with Private IP enabled. Enable the
pgvectorextension: in the Cloud SQL console, addvectorto thecloudsql.enable_pgvectorflag (Cloud SQL 15.7+ / 16.3+) or runCREATE EXTENSION IF NOT EXISTS vectorafter connecting. - GCS bucket in the same region. Grant the Nebula service account
roles/storage.objectAdminon the bucket.
objectStorage block uses S3-protocol env vars. GCS exposes an S3-compatible XML API at https://storage.googleapis.com. Use HMAC keys (Service Accounts → HMAC keys in the Cloud Console) as the credentialsSecret, and set objectStorage.forcePathStyle: false for the GCS XML API. Alternatively, run a MinIO gateway in front of GCS.
Workload Identity setup
-
Create a GCP service account for Nebula:
-
Bind it to the Kubernetes service account the chart creates:
Replace
<release>with yourhelm installrelease name. -
Grant the GCP service account access to GCS:
-
If ESO uses the same GCP service account for Secret Manager access, also grant
roles/secretmanager.secretAccessoron the secrets. -
Annotate the Kubernetes service account in your values file:
Install
1. Push images to Artifact Registry
2. Seed secrets in GCP Secret Manager
username and password keys) by ESO.
3. Copy + fill the reference values file
The bundle shipshelm/examples/gke/values.yaml with GKE-specific knobs pre-wired (Workload Identity annotation, GCS endpoint, nginx ingress, Secret Manager ESO). Copy it, fill in the <placeholder> markers, and save as your-values.yaml.
4. Install
_common/production-sizing.yaml is the shared production-shape sizing block (replicas, CPU/memory requests + limits, persistence) used by all three cloud-managed K8s examples (EKS/AKS/GKE). Omit it to keep the chart’s minimal-dev defaults; override per-workload in your-values.yaml to fit your GKE node SKUs.
The chart runs schema migrations and catalog-apply automatically via a per-revision Job (<release>-nebula-migrations-<revision>); API and worker pods gate startup on an init container that polls public.nebula_release_contract for the install’s release row. releaseContract.releaseId and releaseContract.gitSha are stamped by bundle.sh and consumed automatically.
5. Verify
Upgrade
Pull the new bundle, push new images to Artifact Registry, then:Sizing reference
| Workload | Starter | When to scale |
|---|---|---|
| API | 2 replicas, 1 CPU / 2-4 GB | HPA on CPU >70% sustained |
| Worker | 2 replicas, 2 CPU / 4-8 GB | HPA on queue depth (Hatchet metric) |
| Graph engine | 2 replicas, 2 CPU / 4-8 GB | Manual; restart-sensitive (WAL replay) |
| Compactor | 1 replica, 1 CPU / 2-4 GB | Single-writer; do not scale horizontally |
| RabbitMQ | 1 replica, 8 GB PVC | Single-broker is fine up to ~10k workflows/min |
n2-standard-4 (4 vCPU / 16 GB) for API, worker, Hatchet; n2-highmem-4 (4 vCPU / 32 GB) for graph-engine and compactor.
Troubleshooting
Workload Identity not bound — pods receive permission denied from GCS
Workload Identity not bound — pods receive permission denied from GCS
Confirm the Kubernetes SA annotation is set:
kubectl -n nebula describe sa <release>-nebula-sa should show iam.gke.io/gcp-service-account. Also verify the IAM binding: gcloud iam service-accounts get-iam-policy nebula-sa@<project>.iam.gserviceaccount.com should list the workloadIdentityUser binding for the K8s SA. Ensure the cluster’s Workload Identity pool (<project>.svc.id.goog) is enabled.GCE Ingress (not nginx) provisioning slow
GCE Ingress (not nginx) provisioning slow
The GCE Ingress controller provisions a Google Cloud Load Balancer which can take 5-10 minutes. Check
kubectl -n nebula describe ingress nebula for events. If you need faster provisioning, switch ingress.className: nginx and install the nginx Ingress controller instead.pgvector missing on Cloud SQL — 'extension vector does not exist'
pgvector missing on Cloud SQL — 'extension vector does not exist'
Cloud SQL for PostgreSQL 16.3+ supports pgvector via the
vector extension. Enable it in the Cloud SQL flags (cloudsql.enable_pgvector=on) then run CREATE EXTENSION IF NOT EXISTS vector; in each database. Cloud SQL docs: Use pgvector.GCS HMAC credentials rejected by graph-engine
GCS HMAC credentials rejected by graph-engine
Verify the HMAC key is created for a service account (not a user account). HMAC keys for service accounts are under IAM & Admin → Service Accounts → select the account → Keys tab → HMAC keys. Store the Access ID and Secret in the Kubernetes Secret referenced by
objectStorage.credentialsSecret. The Secret must have AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY keys — those exact uppercase names — the chart’s nebula.objectStorageEnv helper reads them via secretKeyRef.key.