Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.trynebula.ai/llms.txt

Use this file to discover all available pages before exploring further.

Nebula can route selected LLM tasks to named OpenAI-compatible endpoints. This is intended for enterprise and self-hosted deployments that run their own vLLM, LiteLLM, Modal, or gateway-backed models.
app.model_endpoints.* entries are OpenAI-compatible by definition. Native hosted providers such as openai/..., anthropic/..., gemini/..., and azure/... should stay on their provider-prefixed model strings.

Configure endpoints

Reference a local endpoint binding from app model fields with @name.
[app]
vlm = "@vision"
ocr_vlm = "@ocr"

[app.model_endpoints.vision]
base_url = "https://vision.example.com/v1"
model = "qwen35-4b"
request_profile = "qwen3"
api_key_env = "VISION_LLM_API_KEY"

[app.model_endpoints.ocr]
base_url = "https://ocr.example.com/v1"
model = "lightonocr-2-1b"
request_profile = "none"
api_key_env = "OCR_API_KEY"

[app.model_endpoints.text]
base_url = "https://text.example.com/v1"
model = "qwen36"
request_profile = "qwen3"
api_key_env = "TEXT_LLM_API_KEY"
Each endpoint supports:
FieldPurpose
base_urlOpenAI-compatible /v1 endpoint URL.
modelUpstream model string sent in chat-completion requests.
api_keyInline bearer token. Prefer api_key_env for deployed environments.
api_key_envEnvironment variable name to read for the bearer token.
request_profileRequest-shaping adapter: qwen3, anthropic_thinking, or none.
Credential resolution is:
api_key > api_key_env value > <ENDPOINT>_API_KEY > unauthenticated dummy key
For example, if api_key_env = "VISION_LLM_API_KEY" is configured but that environment variable is absent, Nebula falls back to VISION_API_KEY.

Request profiles

request_profile controls request-specific kwargs, not endpoint selection.
ProfileBehavior
qwen3Sends Qwen3 chat_template_kwargs; vision disables thinking, text maps reasoning_effort to enable_thinking.
anthropic_thinkingSends Anthropic native thinking fields for text endpoints behind a compatible proxy.
noneSends no extra request body. This is the default for every endpoint.
Use request_profile = "qwen3" only when the endpoint serves a Qwen3-family model. Use none for generic OpenAI-compatible gateways that do not accept Qwen-specific kwargs.

Migration notes

This configuration replaces the older role/Modal-shaped surface.
OldNew
[app.llm_roles.vision_llm][app.model_endpoints.vision]
[app.llm_roles.text_llm][app.model_endpoints.text]
[app.llm_roles.ocr][app.model_endpoints.ocr]
vlm = "modal/vision_llm"vlm = "@vision"
ocr_vlm = "modal/ocr"ocr_vlm = "@ocr"
template = "qwen3"request_profile = "qwen3"
The old TOML fields are not shimmed. Historical model-name aliases such as modal/qwen35-4b and modal/lightonocr-2-1b are intentionally not supported. Use @vision, @ocr, or configure explicit app.model_endpoints entries.

Deployment safety

Repo-managed Nebula deployment config has been migrated to the new format. A deployment using those checked-in config files should continue routing to the existing endpoints. Before deploying an environment with custom config, check for:
llm_roles
template =
modal/vision_llm
modal/qwen35-4b
modal/lightonocr-2-1b
VISION_LLM_BASE_URL
TEXT_LLM_BASE_URL
Replace those with app.model_endpoints.*, request_profile, @vision / @ocr, and endpoint-based environment variables such as VISION_BASE_URL, TEXT_BASE_URL, and OCR_BASE_URL.