mirror of
https://github.com/blakeblackshear/frigate.git
synced 2026-04-11 09:37:37 +03:00
feat: Add remote embedding providers for semantic search
Adds support for remote embedding providers (OpenAI, Ollama) for semantic search. This change introduces a new option in the configuration, allowing users to choose between the existing local Jina AI models and the new remote providers. For vision embeddings, the remote providers use a two-step process: 1. A text description of the image is generated using the configured GenAI provider. 2. An embedding is created from that description using the configured remote embedding provider. This requires a GenAI provider to be configured when using a remote provider for semantic search. The configuration for remote providers has been updated to allow customizing the prompt used for the vision model. Documentation for the new feature has been added to .
This commit is contained in:
parent
9cdc10008d
commit
39fc9e37e1
@ -5,13 +5,13 @@ title: Semantic Search
|
||||
|
||||
Semantic Search in Frigate allows you to find tracked objects within your review items using either the image itself, a user-defined text description, or an automatically generated one. This feature works by creating _embeddings_ — numerical vector representations — for both the images and text descriptions of your tracked objects. By comparing these embeddings, Frigate assesses their similarities to deliver relevant search results.
|
||||
|
||||
Frigate uses models from [Jina AI](https://huggingface.co/jinaai) to create and save embeddings to Frigate's database. All of this runs locally.
|
||||
Frigate can run models locally or be configured to use a remote service. All local processing runs on your own hardware.
|
||||
|
||||
Semantic Search is accessed via the _Explore_ view in the Frigate UI.
|
||||
|
||||
## Minimum System Requirements
|
||||
|
||||
Semantic Search works by running a large AI model locally on your system. Small or underpowered systems like a Raspberry Pi will not run Semantic Search reliably or at all.
|
||||
When running models locally, Semantic Search works by running a large AI model on your system. Small or underpowered systems like a Raspberry Pi will not run Semantic Search reliably or at all.
|
||||
|
||||
A minimum of 8GB of RAM is required to use Semantic Search. A GPU is not strictly required but will provide a significant performance increase over CPU-only systems.
|
||||
|
||||
@ -35,7 +35,11 @@ If you are enabling Semantic Search for the first time, be advised that Frigate
|
||||
|
||||
:::
|
||||
|
||||
### Jina AI CLIP (version 1)
|
||||
### Local Providers
|
||||
|
||||
Frigate uses models from [Jina AI](https://huggingface.co/jinaai) to create and save embeddings to Frigate's database. All of this runs locally.
|
||||
|
||||
#### Jina AI CLIP (version 1)
|
||||
|
||||
The [V1 model from Jina](https://huggingface.co/jinaai/jina-clip-v1) has a vision model which is able to embed both images and text into the same vector space, which allows `image -> image` and `text -> image` similarity searches. Frigate uses this model on tracked objects to encode the thumbnail image and store it in the database. When searching for tracked objects via text in the search box, Frigate will perform a `text -> image` similarity search against this embedding. When clicking "Find Similar" in the tracked object detail pane, Frigate will perform an `image -> image` similarity search to retrieve the closest matching thumbnails.
|
||||
|
||||
@ -46,14 +50,14 @@ Differently weighted versions of the Jina models are available and can be select
|
||||
```yaml
|
||||
semantic_search:
|
||||
enabled: True
|
||||
model: "jinav1"
|
||||
model_size: small
|
||||
local_model: "jinav1"
|
||||
local_model_size: small
|
||||
```
|
||||
|
||||
- Configuring the `large` model employs the full Jina model and will automatically run on the GPU if applicable.
|
||||
- Configuring the `small` model employs a quantized version of the Jina model that uses less RAM and runs on CPU with a very negligible difference in embedding quality.
|
||||
|
||||
### Jina AI CLIP (version 2)
|
||||
#### Jina AI CLIP (version 2)
|
||||
|
||||
Frigate also supports the [V2 model from Jina](https://huggingface.co/jinaai/jina-clip-v2), which introduces multilingual support (89 languages). In contrast, the V1 model only supports English.
|
||||
|
||||
@ -64,8 +68,8 @@ To use the V2 model, update the `model` parameter in your config:
|
||||
```yaml
|
||||
semantic_search:
|
||||
enabled: True
|
||||
model: "jinav2"
|
||||
model_size: large
|
||||
local_model: "jinav2"
|
||||
local_model_size: large
|
||||
```
|
||||
|
||||
For most users, especially native English speakers, the V1 model remains the recommended choice.
|
||||
@ -76,6 +80,25 @@ Switching between V1 and V2 requires reindexing your embeddings. The embeddings
|
||||
|
||||
:::
|
||||
|
||||
### Remote Providers
|
||||
|
||||
Frigate can be configured to use remote services for generating embeddings. This is done by setting the `provider` field to `openai` or `ollama`.
|
||||
|
||||
For vision embeddings, remote providers use a two-step process:
|
||||
1. A text description of the image is generated using the configured GenAI provider.
|
||||
2. An embedding is created from that description using the configured remote embedding provider.
|
||||
|
||||
This means that you must have a GenAI provider configured to use vision embeddings with a remote provider.
|
||||
|
||||
```yaml
|
||||
semantic_search:
|
||||
enabled: True
|
||||
provider: openai
|
||||
remote:
|
||||
model: "text-embedding-3-small"
|
||||
vision_model_prompt: "A detailed description of the image for semantic search."
|
||||
```
|
||||
|
||||
### GPU Acceleration
|
||||
|
||||
The CLIP models are downloaded in ONNX format, and the `large` model can be accelerated using GPU hardware, when available. This depends on the Docker build that is used. You can also target a specific device in a multi-GPU installation.
|
||||
@ -83,7 +106,7 @@ The CLIP models are downloaded in ONNX format, and the `large` model can be acce
|
||||
```yaml
|
||||
semantic_search:
|
||||
enabled: True
|
||||
model_size: large
|
||||
local_model_size: large
|
||||
# Optional, if using the 'large' model in a multi-GPU installation
|
||||
device: 0
|
||||
```
|
||||
|
||||
@ -114,6 +114,30 @@ class CustomClassificationConfig(FrigateBaseModel):
|
||||
state_config: CustomClassificationStateConfig | None = Field(default=None)
|
||||
|
||||
|
||||
class SemanticSearchProviderEnum(str, Enum):
|
||||
local = "local"
|
||||
openai = "openai"
|
||||
ollama = "ollama"
|
||||
|
||||
|
||||
class RemoteSemanticSearchConfig(FrigateBaseModel):
|
||||
"""Config for remote semantic search providers."""
|
||||
|
||||
api_key: Optional[str] = Field(
|
||||
default=None, title="API key for the remote embedding provider."
|
||||
)
|
||||
model: Optional[str] = Field(
|
||||
default=None, title="The embedding model to use for semantic search."
|
||||
)
|
||||
url: Optional[str] = Field(
|
||||
default=None, title="URL for the remote embedding provider."
|
||||
)
|
||||
vision_model_prompt: Optional[str] = Field(
|
||||
default="A detailed description of the image for semantic search.",
|
||||
title="Prompt for the vision model to describe the image for embedding. This uses the configured GenAI provider.",
|
||||
)
|
||||
|
||||
|
||||
class ClassificationConfig(FrigateBaseModel):
|
||||
bird: BirdClassificationConfig = Field(
|
||||
default_factory=BirdClassificationConfig, title="Bird classification config."
|
||||
@ -124,22 +148,32 @@ class ClassificationConfig(FrigateBaseModel):
|
||||
|
||||
|
||||
class SemanticSearchConfig(FrigateBaseModel):
|
||||
"""Config for semantic search."""
|
||||
|
||||
enabled: bool = Field(default=False, title="Enable semantic search.")
|
||||
reindex: Optional[bool] = Field(
|
||||
default=False, title="Reindex all tracked objects on startup."
|
||||
)
|
||||
model: Optional[SemanticSearchModelEnum] = Field(
|
||||
default=SemanticSearchModelEnum.jinav1,
|
||||
title="The CLIP model to use for semantic search.",
|
||||
provider: SemanticSearchProviderEnum = Field(
|
||||
default=SemanticSearchProviderEnum.local,
|
||||
title="The semantic search provider to use.",
|
||||
)
|
||||
model_size: str = Field(
|
||||
default="small", title="The size of the embeddings model used."
|
||||
local_model: Optional[SemanticSearchModelEnum] = Field(
|
||||
default=SemanticSearchModelEnum.jinav1,
|
||||
title="The local CLIP model to use for semantic search.",
|
||||
)
|
||||
local_model_size: str = Field(
|
||||
default="small", title="The size of the local embeddings model used."
|
||||
)
|
||||
device: Optional[str] = Field(
|
||||
default=None,
|
||||
title="The device key to use for semantic search.",
|
||||
description="This is an override, to target a specific device. See https://onnxruntime.ai/docs/execution-providers/ for more information",
|
||||
)
|
||||
remote: RemoteSemanticSearchConfig = Field(
|
||||
default_factory=RemoteSemanticSearchConfig,
|
||||
title="Remote semantic search provider config.",
|
||||
)
|
||||
|
||||
|
||||
class TriggerConfig(FrigateBaseModel):
|
||||
|
||||
@ -16,8 +16,7 @@ from frigate.comms.embeddings_updater import (
|
||||
EmbeddingsRequestEnum,
|
||||
)
|
||||
from frigate.comms.inter_process import InterProcessRequestor
|
||||
from frigate.config import FrigateConfig
|
||||
from frigate.config.classification import SemanticSearchModelEnum
|
||||
from frigate.config import FrigateConfig, SemanticSearchModelEnum, SemanticSearchProviderEnum
|
||||
from frigate.const import (
|
||||
CONFIG_DIR,
|
||||
TRIGGER_DIR,
|
||||
@ -26,6 +25,7 @@ from frigate.const import (
|
||||
)
|
||||
from frigate.data_processing.types import DataProcessorMetrics
|
||||
from frigate.db.sqlitevecq import SqliteVecQueueDatabase
|
||||
from frigate.embeddings.remote import get_embedding_client
|
||||
from frigate.models import Event, Trigger
|
||||
from frigate.types import ModelStatusTypesEnum
|
||||
from frigate.util.builtin import EventsPerSecond, InferenceSpeed, serialize
|
||||
@ -96,43 +96,48 @@ class Embeddings:
|
||||
# Create tables if they don't exist
|
||||
self.db.create_embeddings_tables()
|
||||
|
||||
models = self.get_model_definitions()
|
||||
if self.config.semantic_search.provider == SemanticSearchProviderEnum.local:
|
||||
models = self.get_model_definitions()
|
||||
|
||||
for model in models:
|
||||
self.requestor.send_data(
|
||||
UPDATE_MODEL_STATE,
|
||||
{
|
||||
"model": model,
|
||||
"state": ModelStatusTypesEnum.not_downloaded,
|
||||
},
|
||||
)
|
||||
for model in models:
|
||||
self.requestor.send_data(
|
||||
UPDATE_MODEL_STATE,
|
||||
{
|
||||
"model": model,
|
||||
"state": ModelStatusTypesEnum.not_downloaded,
|
||||
},
|
||||
)
|
||||
|
||||
if self.config.semantic_search.model == SemanticSearchModelEnum.jinav2:
|
||||
# Single JinaV2Embedding instance for both text and vision
|
||||
self.embedding = JinaV2Embedding(
|
||||
model_size=self.config.semantic_search.model_size,
|
||||
requestor=self.requestor,
|
||||
device=config.semantic_search.device
|
||||
or ("GPU" if config.semantic_search.model_size == "large" else "CPU"),
|
||||
)
|
||||
self.text_embedding = lambda input_data: self.embedding(
|
||||
input_data, embedding_type="text"
|
||||
)
|
||||
self.vision_embedding = lambda input_data: self.embedding(
|
||||
input_data, embedding_type="vision"
|
||||
)
|
||||
else: # Default to jinav1
|
||||
self.text_embedding = JinaV1TextEmbedding(
|
||||
model_size=config.semantic_search.model_size,
|
||||
requestor=self.requestor,
|
||||
device="CPU",
|
||||
)
|
||||
self.vision_embedding = JinaV1ImageEmbedding(
|
||||
model_size=config.semantic_search.model_size,
|
||||
requestor=self.requestor,
|
||||
device=config.semantic_search.device
|
||||
or ("GPU" if config.semantic_search.model_size == "large" else "CPU"),
|
||||
)
|
||||
if self.config.semantic_search.local_model == SemanticSearchModelEnum.jinav2:
|
||||
# Single JinaV2Embedding instance for both text and vision
|
||||
self.embedding = JinaV2Embedding(
|
||||
model_size=self.config.semantic_search.local_model_size,
|
||||
requestor=self.requestor,
|
||||
device=config.semantic_search.device
|
||||
or ("GPU" if config.semantic_search.local_model_size == "large" else "CPU"),
|
||||
)
|
||||
self.text_embedding = lambda input_data: self.embedding(
|
||||
input_data, embedding_type="text"
|
||||
)
|
||||
self.vision_embedding = lambda input_data: self.embedding(
|
||||
input_data, embedding_type="vision"
|
||||
)
|
||||
else: # Default to jinav1
|
||||
self.text_embedding = JinaV1TextEmbedding(
|
||||
model_size=config.semantic_search.local_model_size,
|
||||
requestor=self.requestor,
|
||||
device="CPU",
|
||||
)
|
||||
self.vision_embedding = JinaV1ImageEmbedding(
|
||||
model_size=config.semantic_search.local_model_size,
|
||||
requestor=self.requestor,
|
||||
device=config.semantic_search.device
|
||||
or ("GPU" if config.semantic_search.local_model_size == "large" else "CPU"),
|
||||
)
|
||||
else:
|
||||
self.remote_embedding_client = get_embedding_client(self.config)
|
||||
self.text_embedding = self.remote_embedding_client.embed_texts
|
||||
self.vision_embedding = self.remote_embedding_client.embed_images
|
||||
|
||||
def update_stats(self) -> None:
|
||||
self.metrics.image_embeddings_eps.value = self.image_eps.eps()
|
||||
|
||||
67
frigate/embeddings/remote/__init__.py
Normal file
67
frigate/embeddings/remote/__init__.py
Normal file
@ -0,0 +1,67 @@
|
||||
"""Remote embedding clients for Frigate."""
|
||||
|
||||
import importlib
|
||||
import logging
|
||||
import os
|
||||
from typing import Any, Optional
|
||||
|
||||
from frigate.config import FrigateConfig, SemanticSearchConfig, SemanticSearchProviderEnum
|
||||
from frigate.genai import get_genai_client
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
PROVIDERS = {}
|
||||
|
||||
|
||||
def register_embedding_provider(key: SemanticSearchProviderEnum):
|
||||
"""Register a remote embedding provider."""
|
||||
|
||||
def decorator(cls):
|
||||
PROVIDERS[key] = cls
|
||||
return cls
|
||||
|
||||
return decorator
|
||||
|
||||
|
||||
class RemoteEmbeddingClient:
|
||||
"""Remote embedding client for Frigate."""
|
||||
|
||||
def __init__(self, config: FrigateConfig, timeout: int = 120) -> None:
|
||||
self.config = config
|
||||
self.timeout = timeout
|
||||
self.provider = self._init_provider()
|
||||
self.genai_client = get_genai_client(self.config)
|
||||
|
||||
def _init_provider(self):
|
||||
"""Initialize the client."""
|
||||
return None
|
||||
|
||||
def embed_texts(self, texts: list[str]) -> Optional[list[list[float]]]:
|
||||
"""Get embeddings for a list of texts."""
|
||||
return None
|
||||
|
||||
def embed_images(self, images: list[bytes]) -> Optional[list[list[float]]]:
|
||||
"""Get embeddings for a list of images."""
|
||||
return None
|
||||
|
||||
|
||||
|
||||
def get_embedding_client(config: FrigateConfig) -> Optional[RemoteEmbeddingClient]:
|
||||
"""Get the embedding client."""
|
||||
if not config.semantic_search.provider or config.semantic_search.provider == SemanticSearchProviderEnum.local:
|
||||
return None
|
||||
|
||||
load_providers()
|
||||
provider = PROVIDERS.get(config.semantic_search.provider)
|
||||
if provider:
|
||||
return provider(config)
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def load_providers():
|
||||
package_dir = os.path.dirname(__file__)
|
||||
for filename in os.listdir(package_dir):
|
||||
if filename.endswith(".py") and filename != "__init__.py":
|
||||
module_name = f"frigate.embeddings.remote.{filename[:-3]}"
|
||||
importlib.import_module(module_name)
|
||||
91
frigate/embeddings/remote/ollama.py
Normal file
91
frigate/embeddings/remote/ollama.py
Normal file
@ -0,0 +1,91 @@
|
||||
"""Ollama embedding client for Frigate."""
|
||||
|
||||
import logging
|
||||
from typing import Optional
|
||||
|
||||
from httpx import TimeoutException
|
||||
from ollama import Client as ApiClient
|
||||
from ollama import ResponseError
|
||||
|
||||
from frigate.config import SemanticSearchProviderEnum
|
||||
from frigate.embeddings.remote import (
|
||||
RemoteEmbeddingClient,
|
||||
register_embedding_provider,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@register_embedding_provider(SemanticSearchProviderEnum.ollama)
|
||||
class OllamaEmbeddingClient(RemoteEmbeddingClient):
|
||||
"""Remote embedding client for Frigate using Ollama."""
|
||||
|
||||
provider: ApiClient
|
||||
|
||||
def _init_provider(self):
|
||||
"""Initialize the client."""
|
||||
try:
|
||||
client = ApiClient(
|
||||
host=self.config.semantic_search.remote.url, timeout=self.timeout
|
||||
)
|
||||
# ensure the model is available locally
|
||||
response = client.show(self.config.semantic_search.remote.model)
|
||||
if response.get("error"):
|
||||
logger.error(
|
||||
"Ollama error: %s",
|
||||
response["error"],
|
||||
)
|
||||
return None
|
||||
return client
|
||||
except Exception as e:
|
||||
logger.warning("Error initializing Ollama: %s", str(e))
|
||||
return None
|
||||
|
||||
def embed_texts(self, texts: list[str]) -> Optional[list[list[float]]]:
|
||||
"""Get embeddings for a list of texts."""
|
||||
if self.provider is None:
|
||||
logger.warning(
|
||||
"Ollama provider has not been initialized, embeddings will not be generated. Check your Ollama configuration."
|
||||
)
|
||||
return None
|
||||
try:
|
||||
embeddings = []
|
||||
for text in texts:
|
||||
result = self.provider.embeddings(
|
||||
model=self.config.semantic_search.remote.model,
|
||||
prompt=text,
|
||||
)
|
||||
embeddings.append(result["embedding"])
|
||||
return embeddings
|
||||
except (TimeoutException, ResponseError, ConnectionError) as e:
|
||||
logger.warning("Ollama returned an error: %s", str(e))
|
||||
return None
|
||||
|
||||
def embed_images(self, images: list[bytes]) -> Optional[list[list[float]]]:
|
||||
"""Get embeddings for a list of images.
|
||||
|
||||
This uses a two-step process:
|
||||
1. Generate a text description of the image using the configured GenAI provider.
|
||||
2. Create an embedding from the description using the text embedding model.
|
||||
"""
|
||||
if not self.genai_client:
|
||||
logger.warning(
|
||||
"A GenAI provider is not configured. Cannot generate image descriptions."
|
||||
)
|
||||
return None
|
||||
|
||||
descriptions = []
|
||||
for image in images:
|
||||
description = self.genai_client.generate_image_description(
|
||||
prompt=self.config.semantic_search.remote.vision_model_prompt,
|
||||
images=[image],
|
||||
)
|
||||
if description:
|
||||
descriptions.append(description)
|
||||
else:
|
||||
descriptions.append("")
|
||||
|
||||
if not descriptions:
|
||||
return None
|
||||
|
||||
return self.embed_texts(descriptions)
|
||||
78
frigate/embeddings/remote/openai.py
Normal file
78
frigate/embeddings/remote/openai.py
Normal file
@ -0,0 +1,78 @@
|
||||
"""OpenAI embedding client for Frigate."""
|
||||
|
||||
import base64
|
||||
import logging
|
||||
from typing import Optional
|
||||
|
||||
from httpx import TimeoutException
|
||||
from openai import OpenAI
|
||||
|
||||
from frigate.config import SemanticSearchProviderEnum
|
||||
from frigate.embeddings.remote import (
|
||||
RemoteEmbeddingClient,
|
||||
register_embedding_provider,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@register_embedding_provider(SemanticSearchProviderEnum.openai)
|
||||
class OpenAIEmbeddingClient(RemoteEmbeddingClient):
|
||||
"""Remote embedding client for Frigate using OpenAI."""
|
||||
|
||||
provider: OpenAI
|
||||
|
||||
def _init_provider(self):
|
||||
"""Initialize the client."""
|
||||
return OpenAI(
|
||||
api_key=self.config.semantic_search.remote.api_key,
|
||||
base_url=self.config.semantic_search.remote.url,
|
||||
)
|
||||
|
||||
def embed_texts(self, texts: list[str]) -> Optional[list[list[float]]]:
|
||||
"""Get embeddings for a list of texts."""
|
||||
try:
|
||||
result = self.provider.embeddings.create(
|
||||
model=self.config.semantic_search.remote.model,
|
||||
input=texts,
|
||||
timeout=self.timeout,
|
||||
)
|
||||
if (
|
||||
result is not None
|
||||
and hasattr(result, "data")
|
||||
and len(result.data) > 0
|
||||
):
|
||||
return [embedding.embedding for embedding in result.data]
|
||||
return None
|
||||
except (TimeoutException, Exception) as e:
|
||||
logger.warning("OpenAI returned an error: %s", str(e))
|
||||
return None
|
||||
|
||||
def embed_images(self, images: list[bytes]) -> Optional[list[list[float]]]:
|
||||
"""Get embeddings for a list of images.
|
||||
|
||||
This uses a two-step process:
|
||||
1. Generate a text description of the image using the configured GenAI provider.
|
||||
2. Create an embedding from the description using the text embedding model.
|
||||
"""
|
||||
if not self.genai_client:
|
||||
logger.warning(
|
||||
"A GenAI provider is not configured. Cannot generate image descriptions."
|
||||
)
|
||||
return None
|
||||
|
||||
descriptions = []
|
||||
for image in images:
|
||||
description = self.genai_client.generate_image_description(
|
||||
prompt=self.config.semantic_search.remote.vision_model_prompt,
|
||||
images=[image],
|
||||
)
|
||||
if description:
|
||||
descriptions.append(description)
|
||||
else:
|
||||
descriptions.append("")
|
||||
|
||||
if not descriptions:
|
||||
return None
|
||||
|
||||
return self.embed_texts(descriptions)
|
||||
@ -291,6 +291,15 @@ Rules for the report:
|
||||
logger.debug(f"Sending images to genai provider with prompt: {prompt}")
|
||||
return self._send(prompt, thumbnails)
|
||||
|
||||
def generate_image_description(
|
||||
self,
|
||||
prompt: str,
|
||||
images: list[bytes],
|
||||
) -> Optional[str]:
|
||||
"""Generate a description for an image."""
|
||||
logger.debug(f"Sending images to genai provider with prompt: {prompt}")
|
||||
return self._send(prompt, images)
|
||||
|
||||
def _init_provider(self):
|
||||
"""Initialize the client."""
|
||||
return None
|
||||
|
||||
Loading…
Reference in New Issue
Block a user