Skip to main content

Metadata Filtering

Filter search results by metadata attributes. Use logical operators and data type comparisons to find exactly what you need.
Metadata in responses: By default, Nebula returns whitelisted metadata (include_metadata defaults to true). Set include_metadata: false to omit metadata entirely. The whitelist for recall includes common safe fields like title, source, url, doc_type, mime_type, filename, and page.

Basic Filtering

Equality & Not Equal

from nebula import Nebula
nebula = Nebula(api_key="your-api-key")

# Simple equality
results = nebula.search(
    query="machine learning",
    collection_ids=["my-collection"],
    filters={"metadata.category": "research"}
)

# Not equal
results = nebula.search(
    query="documents",
    collection_ids=["docs"],
    filters={"metadata.status": {"$ne": "archived"}}
)

Numeric Comparisons

# Greater than or equal
results = nebula.search(
    query="high priority tasks",
    collection_ids=["tasks"],
    filters={"metadata.priority": {"$gte": 7}}
)

# Range (between values)
results = nebula.search(
    query="quality content",
    collection_ids=["content"],
    filters={
        "$and": [
            {"metadata.score": {"$gte": 80}},
            {"metadata.score": {"$lte": 95}}
        ]
    }
)
Operators: $gt, $gte, $lt, $lte

String Matching

# Case-insensitive pattern matching
results = nebula.search(
    query="employees",
    collection_ids=["team"],
    filters={"metadata.email": {"$ilike": "%@company.com"}}
)

# Case-sensitive pattern matching
results = nebula.search(
    query="documents",
    collection_ids=["docs"],
    filters={"metadata.title": {"$like": "Important%"}}
)
Operators: $like (case-sensitive), $ilike (case-insensitive). Use % as wildcard.

Advanced Filtering

Array Operations

# Has any of these tags
results = nebula.search(
    query="content",
    collection_ids=["posts"],
    filters={"metadata.tags": {"$overlap": ["urgent", "important"]}}
)

# Has all of these skills
results = nebula.search(
    query="candidates",
    collection_ids=["hr"],
    filters={"metadata.skills": {"$contains": ["python", "ml"]}}
)

# Exact value match
results = nebula.search(
    query="items",
    collection_ids=["inventory"],
    filters={"metadata.category": {"$in": ["electronics", "books"]}}
)
Operators: $overlap (has any), $contains (has all), $in (value in list), $nin (not in list)

Logical Operations

# AND: All conditions must match
results = nebula.search(
    query="research papers",
    collection_ids=["academic"],
    filters={
        "$and": [
            {"metadata.verified": True},
            {"metadata.score": {"$gte": 85}},
            {"metadata.department": "ai"}
        ]
    }
)

# OR: Any condition can match
results = nebula.search(
    query="urgent items",
    collection_ids=["tasks"],
    filters={
        "$or": [
            {"metadata.urgent": True},
            {"metadata.priority": {"$gte": 9}},
            {"metadata.deadline": {"$lte": "2024-12-31"}}
        ]
    }
)

Nested Objects

# Filter nested object properties
results = nebula.search(
    query="user profiles",
    collection_ids=["users"],
    filters={
        "metadata.user.profile.age": {"$gte": 25},
        "metadata.user.location.city": "San Francisco"
    }
)

Common Patterns

Date Ranges

from datetime import datetime, timedelta

# Last 30 days
start_date = (datetime.now() - timedelta(days=30)).isoformat()
filters = {"metadata.created_at": {"$gte": start_date}}

# Specific date range
filters = {
    "$and": [
        {"metadata.created_at": {"$gte": "2024-01-01"}},
        {"metadata.created_at": {"$lte": "2024-12-31"}}
    ]
}
cURL
# Last 30 days
curl -X POST "https://api.nebulacloud.app/v1/retrieval/search" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "query": "recent activity",
    "filters": {"metadata.created_at": {"$gte": "REPLACE_WITH_ISO_START"}}
  }'

# Specific date range
curl -X POST "https://api.nebulacloud.app/v1/retrieval/search" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "query": "reports",
    "filters": {
      "$and": [
        {"metadata.created_at": {"$gte": "2024-01-01"}},
        {"metadata.created_at": {"$lte": "2024-12-31"}}
      ]
    }
  }'

Multi-Status Filtering

# Include specific statuses
filters = {"metadata.status": {"$in": ["pending", "in_progress", "review"]}}

# Exclude specific statuses
filters = {"metadata.status": {"$nin": ["archived", "deleted"]}}
cURL
# Include specific statuses
curl -X POST "https://api.nebulacloud.app/v1/retrieval/search" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "query": "tasks",
    "filters": {"metadata.status": {"$in": ["pending", "in_progress", "review"]}}
  }'

# Exclude specific statuses
curl -X POST "https://api.nebulacloud.app/v1/retrieval/search" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "query": "tasks",
    "filters": {"metadata.status": {"$nin": ["archived", "deleted"]}}
  }'

Best Practices

  • Use proper data types: Store numbers as numbers, booleans as booleans
  • Start simple: Begin with basic equality, add complexity as needed
  • Test incrementally: Build complex queries step by step
  • Use appropriate array operators: $overlap for “any of”, $contains for “all of”

Next Steps