GKE

This guide covers a production Nebula install on GKE using Cloud SQL for PostgreSQL, Google Cloud Storage (via HMAC keys or a MinIO bridge), GKE Workload Identity, and external secrets via External Secrets Operator with GCP Secret Manager.

Prereqs

Before helm install, the following must be in place on the cluster side.

Cluster

GKE 1.30+ (Autopilot or Standard mode both work; Standard gives more control over node pools)
Workload Identity enabled on the cluster (--workload-pool=<project>.svc.id.goog) — required for keyless SA binding to GCP IAM
OIDC provider is implicit on GKE when Workload Identity is enabled; no separate step needed

Addons + controllers

Component	Purpose	Install reference
GKE Cluster Autoscaler	Node autoscaling	GKE built-in: `--enable-autoscaling` per node pool
nginx Ingress Controller (or GCE Ingress)	HTTP/HTTPS ingress	kubernetes.github.io/ingress-nginx
cert-manager	TLS from Let’s Encrypt	cert-manager.io/docs
External Secrets Operator (recommended)	Sync from GCP Secret Manager	external-secrets.io

GKE Standard clusters create node pools manually; size them to match the workload sizing table below. GKE Autopilot provisions nodes on-demand from pod resource requests — set resources.requests precisely so Autopilot selects the right machine family.

GCP-managed resources (recommended)

Cloud SQL for PostgreSQL 16 in the same region as the cluster, with Private IP enabled. Enable the pgvector extension: in the Cloud SQL console, add vector to the cloudsql.enable_pgvector flag (Cloud SQL 15.7+ / 16.3+) or run CREATE EXTENSION IF NOT EXISTS vector after connecting.
GCS bucket in the same region. Grant the Nebula service account roles/storage.objectAdmin on the bucket.

Object storage note: the chart’s objectStorage block uses S3-protocol env vars. GCS exposes an S3-compatible XML API at https://storage.googleapis.com. Use HMAC keys (Service Accounts → HMAC keys in the Cloud Console) as the credentialsSecret, and set objectStorage.forcePathStyle: false for the GCS XML API. Alternatively, run a MinIO gateway in front of GCS.

Workload Identity setup

Create a GCP service account for Nebula:

gcloud iam service-accounts create nebula-sa \
  --project <project>

Bind it to the Kubernetes service account the chart creates:

gcloud iam service-accounts add-iam-policy-binding \
  nebula-sa@<project>.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:<project>.svc.id.goog[nebula/<release>-nebula-sa]"

Replace <release> with your helm install release name.

Grant the GCP service account access to GCS:

gcloud storage buckets add-iam-policy-binding gs://<bucket> \
  --role roles/storage.objectAdmin \
  --member "serviceAccount:nebula-sa@<project>.iam.gserviceaccount.com"

If ESO uses the same GCP service account for Secret Manager access, also grant roles/secretmanager.secretAccessor on the secrets.

Annotate the Kubernetes service account in your values file:

serviceAccount:
  annotations:
    iam.gke.io/gcp-service-account: nebula-sa@<project>.iam.gserviceaccount.com

Install

1. Push images to Artifact Registry

tar -xzf nebula-enterprise-<version>.tar.gz
cd nebula-enterprise-<version>/
sha256sum -c checksums.txt
docker load -i images.tar

REGION=us-central1
AR="${REGION}-docker.pkg.dev/<project>/<repo>"
gcloud auth configure-docker "${REGION}-docker.pkg.dev"

docker tag nebula:enterprise-<version>              "${AR}/nebula/nebula-runtime:<version>"
docker tag nebula-graph-engine:enterprise-<version> "${AR}/nebula/graph-engine:<version>"
docker push "${AR}/nebula/nebula-runtime:<version>"
docker push "${AR}/nebula/graph-engine:<version>"

For private-cluster GKE (no public-registry egress), also mirror third-party images:

docker tag ghcr.io/hatchet-dev/hatchet/hatchet-engine:v0.79.0 "${AR}/hatchet-engine:v0.79.0"
docker tag pgvector/pgvector:0.8.0-pg16                       "${AR}/pgvector/pgvector:0.8.0-pg16"
docker tag rabbitmq:3.13-management                           "${AR}/rabbitmq:3.13-management"
docker tag busybox:1.37.0                                     "${AR}/busybox:1.37.0"
docker push "${AR}/hatchet-engine:v0.79.0"
docker push "${AR}/pgvector/pgvector:0.8.0-pg16"
docker push "${AR}/rabbitmq:3.13-management"
docker push "${AR}/busybox:1.37.0"

2. Seed secrets in GCP Secret Manager

echo -n "sk-..."           | gcloud secrets create OPENAI_API_KEY       --data-file=-
echo -n "$(openssl rand -hex 32)" | gcloud secrets create NEBULA_SECRET_KEY --data-file=-
# Repeat for NEBULA_SERVICE_API_KEY, NEBULA_WEBHOOK_HMAC_SECRET,
# NEBULA_INTERNAL_WAKE_TOKEN, NEBULA_VECTOR_BUILD_HATCHET_TRIGGER_TOKEN.

Postgres credentials go in separate secrets and are materialized into Kubernetes Secrets (with username and password keys) by ESO.

3. Copy + fill the reference values file

The bundle ships helm/examples/gke/values.yaml with GKE-specific knobs pre-wired (Workload Identity annotation, GCS endpoint, nginx ingress, Secret Manager ESO). Copy it, fill in the <placeholder> markers, and save as your-values.yaml.

4. Install

gcloud container clusters get-credentials <cluster> --region <region> --project <project>

helm install nebula ./helm/nebula-<version>.tgz \
  -n nebula --create-namespace \
  -f helm/examples/_common/production-sizing.yaml \
  -f your-values.yaml

_common/production-sizing.yaml is the shared production-shape sizing block (replicas, CPU/memory requests + limits, persistence) used by all three cloud-managed K8s examples (EKS/AKS/GKE). Omit it to keep the chart’s minimal-dev defaults; override per-workload in your-values.yaml to fit your GKE node SKUs. The chart runs schema migrations and catalog-apply automatically via a per-revision Job (<release>-nebula-migrations-<revision>); API and worker pods gate startup on an init container that polls public.nebula_release_contract for the install’s release row. releaseContract.releaseId and releaseContract.gitSha are stamped by bundle.sh and consumed automatically.

5. Verify

kubectl -n nebula get pods
kubectl -n nebula get ingress nebula
curl -fsS https://nebula.<your-domain>.com/v1/health

Upgrade

Pull the new bundle, push new images to Artifact Registry, then:

helm upgrade nebula ./helm/nebula-<new-version>.tgz \
  -n nebula \
  -f your-values.yaml

Sizing reference

Workload	Starter	When to scale
API	2 replicas, 1 CPU / 2-4 GB	HPA on CPU >70% sustained
Worker	2 replicas, 2 CPU / 4-8 GB	HPA on queue depth (Hatchet metric)
Graph engine	2 replicas, 2 CPU / 4-8 GB	Manual; restart-sensitive (WAL replay)
Compactor	1 replica, 1 CPU / 2-4 GB	Single-writer; do not scale horizontally
RabbitMQ	1 replica, 8 GB PVC	Single-broker is fine up to ~10k workflows/min

Recommended GKE machine types: n2-standard-4 (4 vCPU / 16 GB) for API, worker, Hatchet; n2-highmem-4 (4 vCPU / 32 GB) for graph-engine and compactor.

Troubleshooting

Workload Identity not bound — pods receive permission denied from GCS

Confirm the Kubernetes SA annotation is set: kubectl -n nebula describe sa <release>-nebula-sa should show iam.gke.io/gcp-service-account. Also verify the IAM binding: gcloud iam service-accounts get-iam-policy nebula-sa@<project>.iam.gserviceaccount.com should list the workloadIdentityUser binding for the K8s SA. Ensure the cluster’s Workload Identity pool (<project>.svc.id.goog) is enabled.

GCE Ingress (not nginx) provisioning slow

The GCE Ingress controller provisions a Google Cloud Load Balancer which can take 5-10 minutes. Check kubectl -n nebula describe ingress nebula for events. If you need faster provisioning, switch ingress.className: nginx and install the nginx Ingress controller instead.

pgvector missing on Cloud SQL — 'extension vector does not exist'

Cloud SQL for PostgreSQL 16.3+ supports pgvector via the vector extension. Enable it in the Cloud SQL flags (cloudsql.enable_pgvector=on) then run CREATE EXTENSION IF NOT EXISTS vector; in each database. Cloud SQL docs: Use pgvector.

GCS HMAC credentials rejected by graph-engine

Verify the HMAC key is created for a service account (not a user account). HMAC keys for service accounts are under IAM & Admin → Service Accounts → select the account → Keys tab → HMAC keys. Store the Access ID and Secret in the Kubernetes Secret referenced by objectStorage.credentialsSecret. The Secret must have AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY keys — those exact uppercase names — the chart’s nebula.objectStorageEnv helper reads them via secretKeyRef.key.

Get Started

Kubernetes

Docker Compose

Reference

Prereqs

Cluster

Addons + controllers

GCP-managed resources (recommended)

Workload Identity setup

Install

1. Push images to Artifact Registry

2. Seed secrets in GCP Secret Manager

3. Copy + fill the reference values file

4. Install

5. Verify

Upgrade

Sizing reference

Troubleshooting

Get Started

Kubernetes

Docker Compose

Reference

Documentation Index

​Prereqs

​Cluster

​Addons + controllers

​GCP-managed resources (recommended)

​Workload Identity setup

​Install

​1. Push images to Artifact Registry

​2. Seed secrets in GCP Secret Manager

​3. Copy + fill the reference values file

​4. Install

​5. Verify

​Upgrade

​Sizing reference

​Troubleshooting

Prereqs

Cluster

Addons + controllers

GCP-managed resources (recommended)

Workload Identity setup

Install

1. Push images to Artifact Registry

2. Seed secrets in GCP Secret Manager

3. Copy + fill the reference values file

4. Install

5. Verify

Upgrade

Sizing reference

Troubleshooting