Skip to main content

Metadata Filtering

Filter search results by metadata attributes using comparison and logical operators.

Operators Reference

OperatorDescriptionExample
$eqEquals (default){"status": "active"}
$neNot equals{"status": {"$ne": "archived"}}
$gtGreater than{"score": {"$gt": 80}}
$gteGreater than or equal{"priority": {"$gte": 7}}
$ltLess than{"age": {"$lt": 30}}
$lteLess than or equal{"count": {"$lte": 100}}
$inValue in list{"status": {"$in": ["a", "b"]}}
$ninNot in list{"status": {"$nin": ["x", "y"]}}
$likePattern match (case-sensitive){"title": {"$like": "Important%"}}
$ilikePattern match (case-insensitive){"email": {"$ilike": "%@company.com"}}
$overlapArray has any{"tags": {"$overlap": ["urgent", "important"]}}
$containsArray has all{"skills": {"$contains": ["python", "ml"]}}
$andAll conditions match{"$and": [{...}, {...}]}
$orAny condition matches{"$or": [{...}, {...}]}
Use % as wildcard in $like and $ilike patterns.
Target metadata fields with the metadata. prefix (e.g., metadata.status). Nested fields are supported.
Search responses can optionally include metadata via search_settings.include_metadata. When included, Nebula returns a whitelisted subset (e.g., title, source, url, doc_type, mime_type, filename, page) to keep results compact.

Examples

from nebula import Nebula
nebula = Nebula(api_key="your-api-key")

# Simple equality
results = nebula.search(
    query="machine learning",
    collection_ids=["docs"],
    filters={"metadata.category": "research"}
)

# Multiple conditions with $and
results = nebula.search(
    query="reports",
    collection_ids=["tasks"],
    filters={
        "$and": [
            {"metadata.priority": {"$gte": 7}},
            {"metadata.status": {"$in": ["pending", "active"]}},
            {"metadata.created_at": {"$gte": "2024-01-01"}}
        ]
    }
)

# Array operations
results = nebula.search(
    query="candidates",
    collection_ids=["hr"],
    filters={"metadata.skills": {"$contains": ["python", "ml"]}}
)

Common Patterns

Date Ranges

from datetime import datetime, timedelta

# Last 30 days
start = (datetime.now() - timedelta(days=30)).isoformat()
filters = {"metadata.created_at": {"$gte": start}}

# Date range
filters = {
    "$and": [
        {"metadata.created_at": {"$gte": "2024-01-01"}},
        {"metadata.created_at": {"$lte": "2024-12-31"}}
    ]
}

Multi-Status

# Include specific statuses
filters = {"metadata.status": {"$in": ["pending", "in_progress", "review"]}}

# Exclude specific statuses
filters = {"metadata.status": {"$nin": ["archived", "deleted"]}}

Nested Properties

# Filter nested object properties
filters = {
    "metadata.user.profile.age": {"$gte": 25},
    "metadata.user.location.city": "San Francisco"
}

Best Practices

  • Use proper data types: Store numbers as numbers, booleans as booleans
  • Start simple: Begin with basic equality, add complexity as needed
  • Use appropriate array operators: $overlap for “any of”, $contains for “all of”

Next Steps