Documentation Index
Fetch the complete documentation index at: https://docs.trynebula.ai/llms.txt
Use this file to discover all available pages before exploring further.
Nebula can route selected LLM tasks to named OpenAI-compatible endpoints. This is intended for enterprise and self-hosted deployments that run their own vLLM, LiteLLM, Modal, or gateway-backed models.
app.model_endpoints.* entries are OpenAI-compatible by definition. Native hosted providers such as openai/..., anthropic/..., gemini/..., and azure/... should stay on their provider-prefixed model strings.
Reference a local endpoint binding from app model fields with @name.
[app]
vlm = "@vision"
ocr_vlm = "@ocr"
[app.model_endpoints.vision]
base_url = "https://vision.example.com/v1"
model = "qwen35-4b"
request_profile = "qwen3"
api_key_env = "VISION_LLM_API_KEY"
[app.model_endpoints.ocr]
base_url = "https://ocr.example.com/v1"
model = "lightonocr-2-1b"
request_profile = "none"
api_key_env = "OCR_API_KEY"
[app.model_endpoints.text]
base_url = "https://text.example.com/v1"
model = "qwen36"
request_profile = "qwen3"
api_key_env = "TEXT_LLM_API_KEY"
Each endpoint supports:
| Field | Purpose |
|---|
base_url | OpenAI-compatible /v1 endpoint URL. |
model | Upstream model string sent in chat-completion requests. |
api_key | Inline bearer token. Prefer api_key_env for deployed environments. |
api_key_env | Environment variable name to read for the bearer token. |
request_profile | Request-shaping adapter: qwen3, anthropic_thinking, or none. |
Credential resolution is:
api_key > api_key_env value > <ENDPOINT>_API_KEY > unauthenticated dummy key
For example, if api_key_env = "VISION_LLM_API_KEY" is configured but that environment variable is absent, Nebula falls back to VISION_API_KEY.
Request profiles
request_profile controls request-specific kwargs, not endpoint selection.
| Profile | Behavior |
|---|
qwen3 | Sends Qwen3 chat_template_kwargs; vision disables thinking, text maps reasoning_effort to enable_thinking. |
anthropic_thinking | Sends Anthropic native thinking fields for text endpoints behind a compatible proxy. |
none | Sends no extra request body. This is the default for every endpoint. |
Use request_profile = "qwen3" only when the endpoint serves a Qwen3-family model. Use none for generic OpenAI-compatible gateways that do not accept Qwen-specific kwargs.
Migration notes
This configuration replaces the older role/Modal-shaped surface.
| Old | New |
|---|
[app.llm_roles.vision_llm] | [app.model_endpoints.vision] |
[app.llm_roles.text_llm] | [app.model_endpoints.text] |
[app.llm_roles.ocr] | [app.model_endpoints.ocr] |
vlm = "modal/vision_llm" | vlm = "@vision" |
ocr_vlm = "modal/ocr" | ocr_vlm = "@ocr" |
template = "qwen3" | request_profile = "qwen3" |
The old TOML fields are not shimmed. Historical model-name aliases such as modal/qwen35-4b and modal/lightonocr-2-1b are intentionally not supported. Use @vision, @ocr, or configure explicit app.model_endpoints entries.
Deployment safety
Repo-managed Nebula deployment config has been migrated to the new format. A deployment using those checked-in config files should continue routing to the existing endpoints.
Before deploying an environment with custom config, check for:
llm_roles
template =
modal/vision_llm
modal/qwen35-4b
modal/lightonocr-2-1b
VISION_LLM_BASE_URL
TEXT_LLM_BASE_URL
Replace those with app.model_endpoints.*, request_profile, @vision / @ocr, and endpoint-based environment variables such as VISION_BASE_URL, TEXT_BASE_URL, and OCR_BASE_URL.