feat: Add remote embedding providers for semantic search

Adds support for remote embedding providers (OpenAI, Ollama) for semantic search.

This change introduces a new  option in the  configuration, allowing users to choose between the existing local Jina AI models and the new remote providers.

For vision embeddings, the remote providers use a two-step process:
1. A text description of the image is generated using the configured GenAI provider.
2. An embedding is created from that description using the configured remote embedding provider.

This requires a GenAI provider to be configured when using a remote provider for semantic search.

The configuration for remote providers has been updated to allow customizing the prompt used for the vision model.

Documentation for the new feature has been added to .
This commit is contained in:
user 2025-12-10 12:27:57 -05:00
parent 9cdc10008d
commit 39fc9e37e1
7 changed files with 358 additions and 51 deletions

View File

@ -5,13 +5,13 @@ title: Semantic Search
Semantic Search in Frigate allows you to find tracked objects within your review items using either the image itself, a user-defined text description, or an automatically generated one. This feature works by creating _embeddings_ — numerical vector representations — for both the images and text descriptions of your tracked objects. By comparing these embeddings, Frigate assesses their similarities to deliver relevant search results. Semantic Search in Frigate allows you to find tracked objects within your review items using either the image itself, a user-defined text description, or an automatically generated one. This feature works by creating _embeddings_ — numerical vector representations — for both the images and text descriptions of your tracked objects. By comparing these embeddings, Frigate assesses their similarities to deliver relevant search results.
Frigate uses models from [Jina AI](https://huggingface.co/jinaai) to create and save embeddings to Frigate's database. All of this runs locally. Frigate can run models locally or be configured to use a remote service. All local processing runs on your own hardware.
Semantic Search is accessed via the _Explore_ view in the Frigate UI. Semantic Search is accessed via the _Explore_ view in the Frigate UI.
## Minimum System Requirements ## Minimum System Requirements
Semantic Search works by running a large AI model locally on your system. Small or underpowered systems like a Raspberry Pi will not run Semantic Search reliably or at all. When running models locally, Semantic Search works by running a large AI model on your system. Small or underpowered systems like a Raspberry Pi will not run Semantic Search reliably or at all.
A minimum of 8GB of RAM is required to use Semantic Search. A GPU is not strictly required but will provide a significant performance increase over CPU-only systems. A minimum of 8GB of RAM is required to use Semantic Search. A GPU is not strictly required but will provide a significant performance increase over CPU-only systems.
@ -35,7 +35,11 @@ If you are enabling Semantic Search for the first time, be advised that Frigate
::: :::
### Jina AI CLIP (version 1) ### Local Providers
Frigate uses models from [Jina AI](https://huggingface.co/jinaai) to create and save embeddings to Frigate's database. All of this runs locally.
#### Jina AI CLIP (version 1)
The [V1 model from Jina](https://huggingface.co/jinaai/jina-clip-v1) has a vision model which is able to embed both images and text into the same vector space, which allows `image -> image` and `text -> image` similarity searches. Frigate uses this model on tracked objects to encode the thumbnail image and store it in the database. When searching for tracked objects via text in the search box, Frigate will perform a `text -> image` similarity search against this embedding. When clicking "Find Similar" in the tracked object detail pane, Frigate will perform an `image -> image` similarity search to retrieve the closest matching thumbnails. The [V1 model from Jina](https://huggingface.co/jinaai/jina-clip-v1) has a vision model which is able to embed both images and text into the same vector space, which allows `image -> image` and `text -> image` similarity searches. Frigate uses this model on tracked objects to encode the thumbnail image and store it in the database. When searching for tracked objects via text in the search box, Frigate will perform a `text -> image` similarity search against this embedding. When clicking "Find Similar" in the tracked object detail pane, Frigate will perform an `image -> image` similarity search to retrieve the closest matching thumbnails.
@ -46,14 +50,14 @@ Differently weighted versions of the Jina models are available and can be select
```yaml ```yaml
semantic_search: semantic_search:
enabled: True enabled: True
model: "jinav1" local_model: "jinav1"
model_size: small local_model_size: small
``` ```
- Configuring the `large` model employs the full Jina model and will automatically run on the GPU if applicable. - Configuring the `large` model employs the full Jina model and will automatically run on the GPU if applicable.
- Configuring the `small` model employs a quantized version of the Jina model that uses less RAM and runs on CPU with a very negligible difference in embedding quality. - Configuring the `small` model employs a quantized version of the Jina model that uses less RAM and runs on CPU with a very negligible difference in embedding quality.
### Jina AI CLIP (version 2) #### Jina AI CLIP (version 2)
Frigate also supports the [V2 model from Jina](https://huggingface.co/jinaai/jina-clip-v2), which introduces multilingual support (89 languages). In contrast, the V1 model only supports English. Frigate also supports the [V2 model from Jina](https://huggingface.co/jinaai/jina-clip-v2), which introduces multilingual support (89 languages). In contrast, the V1 model only supports English.
@ -64,8 +68,8 @@ To use the V2 model, update the `model` parameter in your config:
```yaml ```yaml
semantic_search: semantic_search:
enabled: True enabled: True
model: "jinav2" local_model: "jinav2"
model_size: large local_model_size: large
``` ```
For most users, especially native English speakers, the V1 model remains the recommended choice. For most users, especially native English speakers, the V1 model remains the recommended choice.
@ -76,6 +80,25 @@ Switching between V1 and V2 requires reindexing your embeddings. The embeddings
::: :::
### Remote Providers
Frigate can be configured to use remote services for generating embeddings. This is done by setting the `provider` field to `openai` or `ollama`.
For vision embeddings, remote providers use a two-step process:
1. A text description of the image is generated using the configured GenAI provider.
2. An embedding is created from that description using the configured remote embedding provider.
This means that you must have a GenAI provider configured to use vision embeddings with a remote provider.
```yaml
semantic_search:
enabled: True
provider: openai
remote:
model: "text-embedding-3-small"
vision_model_prompt: "A detailed description of the image for semantic search."
```
### GPU Acceleration ### GPU Acceleration
The CLIP models are downloaded in ONNX format, and the `large` model can be accelerated using GPU hardware, when available. This depends on the Docker build that is used. You can also target a specific device in a multi-GPU installation. The CLIP models are downloaded in ONNX format, and the `large` model can be accelerated using GPU hardware, when available. This depends on the Docker build that is used. You can also target a specific device in a multi-GPU installation.
@ -83,7 +106,7 @@ The CLIP models are downloaded in ONNX format, and the `large` model can be acce
```yaml ```yaml
semantic_search: semantic_search:
enabled: True enabled: True
model_size: large local_model_size: large
# Optional, if using the 'large' model in a multi-GPU installation # Optional, if using the 'large' model in a multi-GPU installation
device: 0 device: 0
``` ```

View File

@ -114,6 +114,30 @@ class CustomClassificationConfig(FrigateBaseModel):
state_config: CustomClassificationStateConfig | None = Field(default=None) state_config: CustomClassificationStateConfig | None = Field(default=None)
class SemanticSearchProviderEnum(str, Enum):
local = "local"
openai = "openai"
ollama = "ollama"
class RemoteSemanticSearchConfig(FrigateBaseModel):
"""Config for remote semantic search providers."""
api_key: Optional[str] = Field(
default=None, title="API key for the remote embedding provider."
)
model: Optional[str] = Field(
default=None, title="The embedding model to use for semantic search."
)
url: Optional[str] = Field(
default=None, title="URL for the remote embedding provider."
)
vision_model_prompt: Optional[str] = Field(
default="A detailed description of the image for semantic search.",
title="Prompt for the vision model to describe the image for embedding. This uses the configured GenAI provider.",
)
class ClassificationConfig(FrigateBaseModel): class ClassificationConfig(FrigateBaseModel):
bird: BirdClassificationConfig = Field( bird: BirdClassificationConfig = Field(
default_factory=BirdClassificationConfig, title="Bird classification config." default_factory=BirdClassificationConfig, title="Bird classification config."
@ -124,22 +148,32 @@ class ClassificationConfig(FrigateBaseModel):
class SemanticSearchConfig(FrigateBaseModel): class SemanticSearchConfig(FrigateBaseModel):
"""Config for semantic search."""
enabled: bool = Field(default=False, title="Enable semantic search.") enabled: bool = Field(default=False, title="Enable semantic search.")
reindex: Optional[bool] = Field( reindex: Optional[bool] = Field(
default=False, title="Reindex all tracked objects on startup." default=False, title="Reindex all tracked objects on startup."
) )
model: Optional[SemanticSearchModelEnum] = Field( provider: SemanticSearchProviderEnum = Field(
default=SemanticSearchModelEnum.jinav1, default=SemanticSearchProviderEnum.local,
title="The CLIP model to use for semantic search.", title="The semantic search provider to use.",
) )
model_size: str = Field( local_model: Optional[SemanticSearchModelEnum] = Field(
default="small", title="The size of the embeddings model used." default=SemanticSearchModelEnum.jinav1,
title="The local CLIP model to use for semantic search.",
)
local_model_size: str = Field(
default="small", title="The size of the local embeddings model used."
) )
device: Optional[str] = Field( device: Optional[str] = Field(
default=None, default=None,
title="The device key to use for semantic search.", title="The device key to use for semantic search.",
description="This is an override, to target a specific device. See https://onnxruntime.ai/docs/execution-providers/ for more information", description="This is an override, to target a specific device. See https://onnxruntime.ai/docs/execution-providers/ for more information",
) )
remote: RemoteSemanticSearchConfig = Field(
default_factory=RemoteSemanticSearchConfig,
title="Remote semantic search provider config.",
)
class TriggerConfig(FrigateBaseModel): class TriggerConfig(FrigateBaseModel):

View File

@ -16,8 +16,7 @@ from frigate.comms.embeddings_updater import (
EmbeddingsRequestEnum, EmbeddingsRequestEnum,
) )
from frigate.comms.inter_process import InterProcessRequestor from frigate.comms.inter_process import InterProcessRequestor
from frigate.config import FrigateConfig from frigate.config import FrigateConfig, SemanticSearchModelEnum, SemanticSearchProviderEnum
from frigate.config.classification import SemanticSearchModelEnum
from frigate.const import ( from frigate.const import (
CONFIG_DIR, CONFIG_DIR,
TRIGGER_DIR, TRIGGER_DIR,
@ -26,6 +25,7 @@ from frigate.const import (
) )
from frigate.data_processing.types import DataProcessorMetrics from frigate.data_processing.types import DataProcessorMetrics
from frigate.db.sqlitevecq import SqliteVecQueueDatabase from frigate.db.sqlitevecq import SqliteVecQueueDatabase
from frigate.embeddings.remote import get_embedding_client
from frigate.models import Event, Trigger from frigate.models import Event, Trigger
from frigate.types import ModelStatusTypesEnum from frigate.types import ModelStatusTypesEnum
from frigate.util.builtin import EventsPerSecond, InferenceSpeed, serialize from frigate.util.builtin import EventsPerSecond, InferenceSpeed, serialize
@ -96,43 +96,48 @@ class Embeddings:
# Create tables if they don't exist # Create tables if they don't exist
self.db.create_embeddings_tables() self.db.create_embeddings_tables()
models = self.get_model_definitions() if self.config.semantic_search.provider == SemanticSearchProviderEnum.local:
models = self.get_model_definitions()
for model in models: for model in models:
self.requestor.send_data( self.requestor.send_data(
UPDATE_MODEL_STATE, UPDATE_MODEL_STATE,
{ {
"model": model, "model": model,
"state": ModelStatusTypesEnum.not_downloaded, "state": ModelStatusTypesEnum.not_downloaded,
}, },
) )
if self.config.semantic_search.model == SemanticSearchModelEnum.jinav2: if self.config.semantic_search.local_model == SemanticSearchModelEnum.jinav2:
# Single JinaV2Embedding instance for both text and vision # Single JinaV2Embedding instance for both text and vision
self.embedding = JinaV2Embedding( self.embedding = JinaV2Embedding(
model_size=self.config.semantic_search.model_size, model_size=self.config.semantic_search.local_model_size,
requestor=self.requestor, requestor=self.requestor,
device=config.semantic_search.device device=config.semantic_search.device
or ("GPU" if config.semantic_search.model_size == "large" else "CPU"), or ("GPU" if config.semantic_search.local_model_size == "large" else "CPU"),
) )
self.text_embedding = lambda input_data: self.embedding( self.text_embedding = lambda input_data: self.embedding(
input_data, embedding_type="text" input_data, embedding_type="text"
) )
self.vision_embedding = lambda input_data: self.embedding( self.vision_embedding = lambda input_data: self.embedding(
input_data, embedding_type="vision" input_data, embedding_type="vision"
) )
else: # Default to jinav1 else: # Default to jinav1
self.text_embedding = JinaV1TextEmbedding( self.text_embedding = JinaV1TextEmbedding(
model_size=config.semantic_search.model_size, model_size=config.semantic_search.local_model_size,
requestor=self.requestor, requestor=self.requestor,
device="CPU", device="CPU",
) )
self.vision_embedding = JinaV1ImageEmbedding( self.vision_embedding = JinaV1ImageEmbedding(
model_size=config.semantic_search.model_size, model_size=config.semantic_search.local_model_size,
requestor=self.requestor, requestor=self.requestor,
device=config.semantic_search.device device=config.semantic_search.device
or ("GPU" if config.semantic_search.model_size == "large" else "CPU"), or ("GPU" if config.semantic_search.local_model_size == "large" else "CPU"),
) )
else:
self.remote_embedding_client = get_embedding_client(self.config)
self.text_embedding = self.remote_embedding_client.embed_texts
self.vision_embedding = self.remote_embedding_client.embed_images
def update_stats(self) -> None: def update_stats(self) -> None:
self.metrics.image_embeddings_eps.value = self.image_eps.eps() self.metrics.image_embeddings_eps.value = self.image_eps.eps()

View File

@ -0,0 +1,67 @@
"""Remote embedding clients for Frigate."""
import importlib
import logging
import os
from typing import Any, Optional
from frigate.config import FrigateConfig, SemanticSearchConfig, SemanticSearchProviderEnum
from frigate.genai import get_genai_client
logger = logging.getLogger(__name__)
PROVIDERS = {}
def register_embedding_provider(key: SemanticSearchProviderEnum):
"""Register a remote embedding provider."""
def decorator(cls):
PROVIDERS[key] = cls
return cls
return decorator
class RemoteEmbeddingClient:
"""Remote embedding client for Frigate."""
def __init__(self, config: FrigateConfig, timeout: int = 120) -> None:
self.config = config
self.timeout = timeout
self.provider = self._init_provider()
self.genai_client = get_genai_client(self.config)
def _init_provider(self):
"""Initialize the client."""
return None
def embed_texts(self, texts: list[str]) -> Optional[list[list[float]]]:
"""Get embeddings for a list of texts."""
return None
def embed_images(self, images: list[bytes]) -> Optional[list[list[float]]]:
"""Get embeddings for a list of images."""
return None
def get_embedding_client(config: FrigateConfig) -> Optional[RemoteEmbeddingClient]:
"""Get the embedding client."""
if not config.semantic_search.provider or config.semantic_search.provider == SemanticSearchProviderEnum.local:
return None
load_providers()
provider = PROVIDERS.get(config.semantic_search.provider)
if provider:
return provider(config)
return None
def load_providers():
package_dir = os.path.dirname(__file__)
for filename in os.listdir(package_dir):
if filename.endswith(".py") and filename != "__init__.py":
module_name = f"frigate.embeddings.remote.{filename[:-3]}"
importlib.import_module(module_name)

View File

@ -0,0 +1,91 @@
"""Ollama embedding client for Frigate."""
import logging
from typing import Optional
from httpx import TimeoutException
from ollama import Client as ApiClient
from ollama import ResponseError
from frigate.config import SemanticSearchProviderEnum
from frigate.embeddings.remote import (
RemoteEmbeddingClient,
register_embedding_provider,
)
logger = logging.getLogger(__name__)
@register_embedding_provider(SemanticSearchProviderEnum.ollama)
class OllamaEmbeddingClient(RemoteEmbeddingClient):
"""Remote embedding client for Frigate using Ollama."""
provider: ApiClient
def _init_provider(self):
"""Initialize the client."""
try:
client = ApiClient(
host=self.config.semantic_search.remote.url, timeout=self.timeout
)
# ensure the model is available locally
response = client.show(self.config.semantic_search.remote.model)
if response.get("error"):
logger.error(
"Ollama error: %s",
response["error"],
)
return None
return client
except Exception as e:
logger.warning("Error initializing Ollama: %s", str(e))
return None
def embed_texts(self, texts: list[str]) -> Optional[list[list[float]]]:
"""Get embeddings for a list of texts."""
if self.provider is None:
logger.warning(
"Ollama provider has not been initialized, embeddings will not be generated. Check your Ollama configuration."
)
return None
try:
embeddings = []
for text in texts:
result = self.provider.embeddings(
model=self.config.semantic_search.remote.model,
prompt=text,
)
embeddings.append(result["embedding"])
return embeddings
except (TimeoutException, ResponseError, ConnectionError) as e:
logger.warning("Ollama returned an error: %s", str(e))
return None
def embed_images(self, images: list[bytes]) -> Optional[list[list[float]]]:
"""Get embeddings for a list of images.
This uses a two-step process:
1. Generate a text description of the image using the configured GenAI provider.
2. Create an embedding from the description using the text embedding model.
"""
if not self.genai_client:
logger.warning(
"A GenAI provider is not configured. Cannot generate image descriptions."
)
return None
descriptions = []
for image in images:
description = self.genai_client.generate_image_description(
prompt=self.config.semantic_search.remote.vision_model_prompt,
images=[image],
)
if description:
descriptions.append(description)
else:
descriptions.append("")
if not descriptions:
return None
return self.embed_texts(descriptions)

View File

@ -0,0 +1,78 @@
"""OpenAI embedding client for Frigate."""
import base64
import logging
from typing import Optional
from httpx import TimeoutException
from openai import OpenAI
from frigate.config import SemanticSearchProviderEnum
from frigate.embeddings.remote import (
RemoteEmbeddingClient,
register_embedding_provider,
)
logger = logging.getLogger(__name__)
@register_embedding_provider(SemanticSearchProviderEnum.openai)
class OpenAIEmbeddingClient(RemoteEmbeddingClient):
"""Remote embedding client for Frigate using OpenAI."""
provider: OpenAI
def _init_provider(self):
"""Initialize the client."""
return OpenAI(
api_key=self.config.semantic_search.remote.api_key,
base_url=self.config.semantic_search.remote.url,
)
def embed_texts(self, texts: list[str]) -> Optional[list[list[float]]]:
"""Get embeddings for a list of texts."""
try:
result = self.provider.embeddings.create(
model=self.config.semantic_search.remote.model,
input=texts,
timeout=self.timeout,
)
if (
result is not None
and hasattr(result, "data")
and len(result.data) > 0
):
return [embedding.embedding for embedding in result.data]
return None
except (TimeoutException, Exception) as e:
logger.warning("OpenAI returned an error: %s", str(e))
return None
def embed_images(self, images: list[bytes]) -> Optional[list[list[float]]]:
"""Get embeddings for a list of images.
This uses a two-step process:
1. Generate a text description of the image using the configured GenAI provider.
2. Create an embedding from the description using the text embedding model.
"""
if not self.genai_client:
logger.warning(
"A GenAI provider is not configured. Cannot generate image descriptions."
)
return None
descriptions = []
for image in images:
description = self.genai_client.generate_image_description(
prompt=self.config.semantic_search.remote.vision_model_prompt,
images=[image],
)
if description:
descriptions.append(description)
else:
descriptions.append("")
if not descriptions:
return None
return self.embed_texts(descriptions)

View File

@ -291,6 +291,15 @@ Rules for the report:
logger.debug(f"Sending images to genai provider with prompt: {prompt}") logger.debug(f"Sending images to genai provider with prompt: {prompt}")
return self._send(prompt, thumbnails) return self._send(prompt, thumbnails)
def generate_image_description(
self,
prompt: str,
images: list[bytes],
) -> Optional[str]:
"""Generate a description for an image."""
logger.debug(f"Sending images to genai provider with prompt: {prompt}")
return self._send(prompt, images)
def _init_provider(self): def _init_provider(self):
"""Initialize the client.""" """Initialize the client."""
return None return None