Model Endpoints

Nebula can route selected LLM tasks to named OpenAI-compatible endpoints. This is intended for enterprise and self-hosted deployments that run their own vLLM, LiteLLM, Modal, or gateway-backed models.

app.model_endpoints.* entries are OpenAI-compatible by definition. Native hosted providers such as openai/..., anthropic/..., gemini/..., and azure/... should stay on their provider-prefixed model strings.

Configure endpoints

Reference a local endpoint binding from app model fields with @name.

[app]
vlm = "@vision"
ocr_vlm = "@ocr"

[app.model_endpoints.vision]
base_url = "https://vision.example.com/v1"
model = "qwen35-4b"
request_profile = "qwen3"
api_key_env = "VISION_LLM_API_KEY"

[app.model_endpoints.ocr]
base_url = "https://ocr.example.com/v1"
model = "lightonocr-2-1b"
request_profile = "none"
api_key_env = "OCR_API_KEY"

[app.model_endpoints.text]
base_url = "https://text.example.com/v1"
model = "qwen36"
request_profile = "qwen3"
api_key_env = "TEXT_LLM_API_KEY"

Each endpoint supports:

Field	Purpose
`base_url`	OpenAI-compatible `/v1` endpoint URL.
`model`	Upstream `model` string sent in chat-completion requests.
`api_key`	Inline bearer token. Prefer `api_key_env` for deployed environments.
`api_key_env`	Environment variable name to read for the bearer token.
`request_profile`	Request-shaping adapter: `qwen3`, `anthropic_thinking`, or `none`.

Credential resolution is:

api_key > api_key_env value > <ENDPOINT>_API_KEY > unauthenticated dummy key

For example, if api_key_env = "VISION_LLM_API_KEY" is configured but that environment variable is absent, Nebula falls back to VISION_API_KEY.

Request profiles

request_profile controls request-specific kwargs, not endpoint selection.

Profile	Behavior
`qwen3`	Sends Qwen3 `chat_template_kwargs`; vision disables thinking, text maps `reasoning_effort` to `enable_thinking`.
`anthropic_thinking`	Sends Anthropic native `thinking` fields for text endpoints behind a compatible proxy.
`none`	Sends no extra request body. This is the default for every endpoint.

Use request_profile = "qwen3" only when the endpoint serves a Qwen3-family model. Use none for generic OpenAI-compatible gateways that do not accept Qwen-specific kwargs.

Migration notes

This configuration replaces the older role/Modal-shaped surface.

Old	New
`[app.llm_roles.vision_llm]`	`[app.model_endpoints.vision]`
`[app.llm_roles.text_llm]`	`[app.model_endpoints.text]`
`[app.llm_roles.ocr]`	`[app.model_endpoints.ocr]`
`vlm = "modal/vision_llm"`	`vlm = "@vision"`
`ocr_vlm = "modal/ocr"`	`ocr_vlm = "@ocr"`
`template = "qwen3"`	`request_profile = "qwen3"`

The old TOML fields are not shimmed. Historical model-name aliases such as modal/qwen35-4b and modal/lightonocr-2-1b are intentionally not supported. Use @vision, @ocr, or configure explicit app.model_endpoints entries.

Deployment safety

Repo-managed Nebula deployment config has been migrated to the new format. A deployment using those checked-in config files should continue routing to the existing endpoints. Before deploying an environment with custom config, check for:

llm_roles
template =
modal/vision_llm
modal/qwen35-4b
modal/lightonocr-2-1b
VISION_LLM_BASE_URL
TEXT_LLM_BASE_URL

Replace those with app.model_endpoints.*, request_profile, @vision / @ocr, and endpoint-based environment variables such as VISION_BASE_URL, TEXT_BASE_URL, and OCR_BASE_URL.

Get Started

Kubernetes

Docker Compose

Reference

Model Endpoints

Configure endpoints

Request profiles

Migration notes

Deployment safety

Get Started

Kubernetes

Docker Compose

Reference

Documentation Index

​Configure endpoints

​Request profiles

​Migration notes

​Deployment safety

Configure endpoints

Request profiles

Migration notes

Deployment safety