Compare commits

..

1 Commits

Author SHA1 Message Date
Douwe
d750076298
Merge 517c1b76f4 into 1cc50f68a0 2026-01-18 06:13:46 -06:00
6 changed files with 246 additions and 151 deletions

View File

@ -0,0 +1,231 @@
---
id: genai
title: Generative AI
---
Generative AI can be used to automatically generate descriptive text based on the thumbnails of your tracked objects. This helps with [Semantic Search](/configuration/semantic_search) in Frigate to provide more context about your tracked objects. Descriptions are accessed via the _Explore_ view in the Frigate UI by clicking on a tracked object's thumbnail.
Requests for a description are sent off automatically to your AI provider at the end of the tracked object's lifecycle, or can optionally be sent earlier after a number of significantly changed frames, for example in use in more real-time notifications. Descriptions can also be regenerated manually via the Frigate UI. Note that if you are manually entering a description for tracked objects prior to its end, this will be overwritten by the generated response.
## Configuration
Generative AI can be enabled for all cameras or only for specific cameras. If GenAI is disabled for a camera, you can still manually generate descriptions for events using the HTTP API. There are currently 3 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI section below.
To use Generative AI, you must define a single provider at the global level of your Frigate configuration. If the provider you choose requires an API key, you may either directly paste it in your configuration, or store it in an environment variable prefixed with `FRIGATE_`.
```yaml
genai:
provider: gemini
api_key: "{FRIGATE_GEMINI_API_KEY}"
model: gemini-2.0-flash
cameras:
front_camera:
genai:
enabled: True # <- enable GenAI for your front camera
use_snapshot: True
objects:
- person
required_zones:
- steps
indoor_camera:
objects:
genai:
enabled: False # <- disable GenAI for your indoor camera
```
By default, descriptions will be generated for all tracked objects and all zones. But you can also optionally specify `objects` and `required_zones` to only generate descriptions for certain tracked objects or zones.
Optionally, you can generate the description using a snapshot (if enabled) by setting `use_snapshot` to `True`. By default, this is set to `False`, which sends the uncompressed images from the `detect` stream collected over the object's lifetime to the model. Once the object lifecycle ends, only a single compressed and cropped thumbnail is saved with the tracked object. Using a snapshot might be useful when you want to _regenerate_ a tracked object's description as it will provide the AI with a higher-quality image (typically downscaled by the AI itself) than the cropped/compressed thumbnail. Using a snapshot otherwise has a trade-off in that only a single image is sent to your provider, which will limit the model's ability to determine object movement or direction.
Generative AI can also be toggled dynamically for a camera via MQTT with the topic `frigate/<camera_name>/object_descriptions/set`. See the [MQTT documentation](/integrations/mqtt/#frigatecamera_nameobjectdescriptionsset).
## Ollama
:::warning
Using Ollama on CPU is not recommended, high inference times make using Generative AI impractical.
:::
[Ollama](https://ollama.com/) allows you to self-host large language models and keep everything running locally. It provides a nice API over [llama.cpp](https://github.com/ggerganov/llama.cpp). It is highly recommended to host this server on a machine with an Nvidia graphics card, or on a Apple silicon Mac for best performance.
Most of the 7b parameter 4-bit vision models will fit inside 8GB of VRAM. There is also a [Docker container](https://hub.docker.com/r/ollama/ollama) available.
Parallel requests also come with some caveats. You will need to set `OLLAMA_NUM_PARALLEL=1` and choose a `OLLAMA_MAX_QUEUE` and `OLLAMA_MAX_LOADED_MODELS` values that are appropriate for your hardware and preferences. See the [Ollama documentation](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-does-ollama-handle-concurrent-requests).
### Supported Models
You must use a vision capable model with Frigate. Current model variants can be found [in their model library](https://ollama.com/library). At the time of writing, this includes `llava`, `llava-llama3`, `llava-phi3`, and `moondream`. Note that Frigate will not automatically download the model you specify in your config, you must download the model to your local instance of Ollama first i.e. by running `ollama pull llava:7b` on your Ollama server/Docker container. Note that the model specified in Frigate's config must match the downloaded model tag.
:::note
You should have at least 8 GB of RAM available (or VRAM if running on GPU) to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
:::
### Configuration
```yaml
genai:
provider: ollama
base_url: http://localhost:11434
model: qwen3-vl:4b
```
## Google Gemini
Google Gemini has a free tier allowing [15 queries per minute](https://ai.google.dev/pricing) to the API, which is more than sufficient for standard Frigate usage.
### Supported Models
You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://ai.google.dev/gemini-api/docs/models/gemini).
### Get API Key
To start using Gemini, you must first get an API key from [Google AI Studio](https://aistudio.google.com).
1. Accept the Terms of Service
2. Click "Get API Key" from the right hand navigation
3. Click "Create API key in new project"
4. Copy the API key for use in your config
### Configuration
```yaml
genai:
provider: gemini
api_key: "{FRIGATE_GEMINI_API_KEY}"
model: gemini-2.0-flash
```
:::note
To use a different Gemini-compatible API endpoint, set the `GEMINI_BASE_URL` environment variable to your provider's API URL.
:::
## OpenAI
OpenAI does not have a free tier for their API. With the release of gpt-4o, pricing has been reduced and each generation should cost fractions of a cent if you choose to go this route.
### Supported Models
You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://platform.openai.com/docs/models).
### Get API Key
To start using OpenAI, you must first [create an API key](https://platform.openai.com/api-keys) and [configure billing](https://platform.openai.com/settings/organization/billing/overview).
### Configuration
```yaml
genai:
provider: openai
api_key: "{FRIGATE_OPENAI_API_KEY}"
model: gpt-4o
```
:::note
To use a different OpenAI-compatible API endpoint, set the `OPENAI_BASE_URL` environment variable to your provider's API URL.
:::
## Azure OpenAI
Microsoft offers several vision models through Azure OpenAI. A subscription is required.
### Supported Models
You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models).
### Create Resource and Get API Key
To start using Azure OpenAI, you must first [create a resource](https://learn.microsoft.com/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#create-a-resource). You'll need your API key, model name, and resource URL, which must include the `api-version` parameter (see the example below).
### Configuration
```yaml
genai:
provider: azure_openai
base_url: https://instance.cognitiveservices.azure.com/openai/responses?api-version=2025-04-01-preview
model: gpt-5-mini
api_key: "{FRIGATE_OPENAI_API_KEY}"
```
## Usage and Best Practices
Frigate's thumbnail search excels at identifying specific details about tracked objects for example, using an "image caption" approach to find a "person wearing a yellow vest," "a white dog running across the lawn," or "a red car on a residential street." To enhance this further, Frigates default prompts are designed to ask your AI provider about the intent behind the object's actions, rather than just describing its appearance.
While generating simple descriptions of detected objects is useful, understanding intent provides a deeper layer of insight. Instead of just recognizing "what" is in a scene, Frigates default prompts aim to infer "why" it might be there or "what" it could do next. Descriptions tell you whats happening, but intent gives context. For instance, a person walking toward a door might seem like a visitor, but if theyre moving quickly after hours, you can infer a potential break-in attempt. Detecting a person loitering near a door at night can trigger an alert sooner than simply noting "a person standing by the door," helping you respond based on the situations context.
### Using GenAI for notifications
Frigate provides an [MQTT topic](/integrations/mqtt), `frigate/tracked_object_update`, that is updated with a JSON payload containing `event_id` and `description` when your AI provider returns a description for a tracked object. This description could be used directly in notifications, such as sending alerts to your phone or making audio announcements. If additional details from the tracked object are needed, you can query the [HTTP API](/integrations/api/event-events-event-id-get) using the `event_id`, eg: `http://frigate_ip:5000/api/events/<event_id>`.
If looking to get notifications earlier than when an object ceases to be tracked, an additional send trigger can be configured of `after_significant_updates`.
```yaml
genai:
send_triggers:
tracked_object_end: true # default
after_significant_updates: 3 # how many updates to a tracked object before we should send an image
```
## Custom Prompts
Frigate sends multiple frames from the tracked object along with a prompt to your Generative AI provider asking it to generate a description. The default prompt is as follows:
```
Analyze the sequence of images containing the {label}. Focus on the likely intent or behavior of the {label} based on its actions and movement, rather than describing its appearance or the surroundings. Consider what the {label} is doing, why, and what it might do next.
```
:::tip
Prompts can use variable replacements `{label}`, `{sub_label}`, and `{camera}` to substitute information from the tracked object as part of the prompt.
:::
You are also able to define custom prompts in your configuration.
```yaml
genai:
provider: ollama
base_url: http://localhost:11434
model: llava
objects:
prompt: "Analyze the {label} in these images from the {camera} security camera. Focus on the actions, behavior, and potential intent of the {label}, rather than just describing its appearance."
object_prompts:
person: "Examine the main person in these images. What are they doing and what might their actions suggest about their intent (e.g., approaching a door, leaving an area, standing still)? Do not describe the surroundings or static details."
car: "Observe the primary vehicle in these images. Focus on its movement, direction, or purpose (e.g., parking, approaching, circling). If it's a delivery vehicle, mention the company."
```
Prompts can also be overridden at the camera level to provide a more detailed prompt to the model about your specific camera, if you desire.
```yaml
cameras:
front_door:
objects:
genai:
enabled: True
use_snapshot: True
prompt: "Analyze the {label} in these images from the {camera} security camera at the front door. Focus on the actions and potential intent of the {label}."
object_prompts:
person: "Examine the person in these images. What are they doing, and how might their actions suggest their purpose (e.g., delivering something, approaching, leaving)? If they are carrying or interacting with a package, include details about its source or destination."
cat: "Observe the cat in these images. Focus on its movement and intent (e.g., wandering, hunting, interacting with objects). If the cat is near the flower pots or engaging in any specific actions, mention it."
objects:
- person
- cat
required_zones:
- steps
```
### Experiment with prompts
Many providers also have a public facing chat interface for their models. Download a couple of different thumbnails or snapshots from Frigate and try new things in the playground to get descriptions to your liking before updating the prompt in Frigate.
- OpenAI - [ChatGPT](https://chatgpt.com)
- Gemini - [Google AI Studio](https://aistudio.google.com)
- Ollama - [Open WebUI](https://docs.openwebui.com/)

View File

@ -5,7 +5,7 @@ title: Configuring Generative AI
## Configuration
A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 4 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI section below.
A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 3 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI section below.
To use Generative AI, you must define a single provider at the global level of your Frigate configuration. If the provider you choose requires an API key, you may either directly paste it in your configuration, or store it in an environment variable prefixed with `FRIGATE_`.
@ -41,12 +41,12 @@ If you are trying to use a single model for Frigate and HomeAssistant, it will n
The following models are recommended:
| Model | Notes |
| ------------- | -------------------------------------------------------------------- |
| `qwen3-vl` | Strong visual and situational understanding, higher vram requirement |
| `Intern3.5VL` | Relatively fast with good vision comprehension |
| `gemma3` | Strong frame-to-frame understanding, slower inference times |
| `qwen2.5-vl` | Fast but capable model with good vision comprehension |
| Model | Notes |
| ----------------- | -------------------------------------------------------------------- |
| `qwen3-vl` | Strong visual and situational understanding, higher vram requirement |
| `Intern3.5VL` | Relatively fast with good vision comprehension |
| `gemma3` | Strong frame-to-frame understanding, slower inference times |
| `qwen2.5-vl` | Fast but capable model with good vision comprehension |
:::note
@ -61,46 +61,12 @@ genai:
provider: ollama
base_url: http://localhost:11434
model: minicpm-v:8b
provider_options: # other Ollama client options can be defined
provider_options: # other Ollama client options can be defined
keep_alive: -1
options:
num_ctx: 8192 # make sure the context matches other services that are using ollama
num_ctx: 8192 # make sure the context matches other services that are using ollama
```
## llama.cpp
[llama.cpp](https://github.com/ggml-org/llama.cpp) is a C++ implementation of LLaMA that provides a high-performance inference server. Using llama.cpp directly gives you access to all native llama.cpp options and parameters.
:::warning
Using llama.cpp on CPU is not recommended, high inference times make using Generative AI impractical.
:::
It is highly recommended to host the llama.cpp server on a machine with a discrete graphics card, or on an Apple silicon Mac for best performance.
### Supported Models
You must use a vision capable model with Frigate. The llama.cpp server supports various vision models in GGUF format.
### Configuration
```yaml
genai:
provider: llamacpp
base_url: http://localhost:8080
model: your-model-name
provider_options:
temperature: 0.7
repeat_penalty: 1.05
top_p: 0.8
top_k: 40
min_p: 0.05
seed: -1
```
All llama.cpp native options can be passed through `provider_options`, including `temperature`, `top_k`, `top_p`, `min_p`, `repeat_penalty`, `repeat_last_n`, `seed`, `grammar`, and more. See the [llama.cpp server documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for a complete list of available parameters.
## Google Gemini
Google Gemini has a free tier allowing [15 queries per minute](https://ai.google.dev/pricing) to the API, which is more than sufficient for standard Frigate usage.

View File

@ -11,7 +11,7 @@ By default, descriptions will be generated for all tracked objects and all zones
Optionally, you can generate the description using a snapshot (if enabled) by setting `use_snapshot` to `True`. By default, this is set to `False`, which sends the uncompressed images from the `detect` stream collected over the object's lifetime to the model. Once the object lifecycle ends, only a single compressed and cropped thumbnail is saved with the tracked object. Using a snapshot might be useful when you want to _regenerate_ a tracked object's description as it will provide the AI with a higher-quality image (typically downscaled by the AI itself) than the cropped/compressed thumbnail. Using a snapshot otherwise has a trade-off in that only a single image is sent to your provider, which will limit the model's ability to determine object movement or direction.
Generative AI object descriptions can also be toggled dynamically for a camera via MQTT with the topic `frigate/<camera_name>/object_descriptions/set`. See the [MQTT documentation](/integrations/mqtt#frigatecamera_nameobject_descriptionsset).
Generative AI object descriptions can also be toggled dynamically for a camera via MQTT with the topic `frigate/<camera_name>/object_descriptions/set`. See the [MQTT documentation](/integrations/mqtt/#frigatecamera_nameobjectdescriptionsset).
## Usage and Best Practices
@ -42,10 +42,10 @@ genai:
model: llava
objects:
prompt: "Analyze the {label} in these images from the {camera} security camera. Focus on the actions, behavior, and potential intent of the {label}, rather than just describing its appearance."
object_prompts:
person: "Examine the main person in these images. What are they doing and what might their actions suggest about their intent (e.g., approaching a door, leaving an area, standing still)? Do not describe the surroundings or static details."
car: "Observe the primary vehicle in these images. Focus on its movement, direction, or purpose (e.g., parking, approaching, circling). If it's a delivery vehicle, mention the company."
prompt: "Analyze the {label} in these images from the {camera} security camera. Focus on the actions, behavior, and potential intent of the {label}, rather than just describing its appearance."
object_prompts:
person: "Examine the main person in these images. What are they doing and what might their actions suggest about their intent (e.g., approaching a door, leaving an area, standing still)? Do not describe the surroundings or static details."
car: "Observe the primary vehicle in these images. Focus on its movement, direction, or purpose (e.g., parking, approaching, circling). If it's a delivery vehicle, mention the company."
```
Prompts can also be overridden at the camera level to provide a more detailed prompt to the model about your specific camera, if you desire.

View File

@ -7,7 +7,7 @@ Generative AI can be used to automatically generate structured summaries of revi
Requests for a summary are requested automatically to your AI provider for alert review items when the activity has ended, they can also be optionally enabled for detections as well.
Generative AI review summaries can also be toggled dynamically for a [camera via MQTT](/integrations/mqtt#frigatecamera_namereview_descriptionsset).
Generative AI review summaries can also be toggled dynamically for a [camera via MQTT](/integrations/mqtt/#frigatecamera_namereviewdescriptionsset).
## Review Summary Usage and Best Practices

View File

@ -14,7 +14,6 @@ class GenAIProviderEnum(str, Enum):
azure_openai = "azure_openai"
gemini = "gemini"
ollama = "ollama"
llamacpp = "llamacpp"
class GenAIConfig(FrigateBaseModel):

View File

@ -1,101 +0,0 @@
"""llama.cpp Provider for Frigate AI."""
import base64
import logging
from typing import Any, Optional
import requests
from frigate.config import GenAIProviderEnum
from frigate.genai import GenAIClient, register_genai_provider
logger = logging.getLogger(__name__)
@register_genai_provider(GenAIProviderEnum.llamacpp)
class LlamaCppClient(GenAIClient):
"""Generative AI client for Frigate using llama.cpp server."""
LOCAL_OPTIMIZED_OPTIONS = {
"temperature": 0.7,
"repeat_penalty": 1.05,
"top_p": 0.8,
}
provider: str # base_url
provider_options: dict[str, Any]
def _init_provider(self):
"""Initialize the client."""
self.provider_options = {
**self.LOCAL_OPTIMIZED_OPTIONS,
**self.genai_config.provider_options,
}
return (
self.genai_config.base_url.rstrip("/")
if self.genai_config.base_url
else None
)
def _send(self, prompt: str, images: list[bytes]) -> Optional[str]:
"""Submit a request to llama.cpp server."""
if self.provider is None:
logger.warning(
"llama.cpp provider has not been initialized, a description will not be generated. Check your llama.cpp configuration."
)
return None
try:
content = []
for image in images:
encoded_image = base64.b64encode(image).decode("utf-8")
content.append(
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{encoded_image}",
},
}
)
content.append(
{
"type": "text",
"text": prompt,
}
)
# Build request payload with llama.cpp native options
payload = {
"messages": [
{
"role": "user",
"content": content,
},
],
**self.provider_options,
}
response = requests.post(
f"{self.provider}/v1/chat/completions",
json=payload,
timeout=self.timeout,
)
response.raise_for_status()
result = response.json()
if (
result is not None
and "choices" in result
and len(result["choices"]) > 0
):
choice = result["choices"][0]
if "message" in choice and "content" in choice["message"]:
return choice["message"]["content"].strip()
return None
except Exception as e:
logger.warning("llama.cpp returned an error: %s", str(e))
return None
def get_context_size(self) -> int:
"""Get the context window size for llama.cpp."""
return self.genai_config.provider_options.get("context_size", 4096)