diff --git a/docs/docs/configuration/gemini.md b/docs/docs/configuration/gemini.md new file mode 100644 index 000000000..994184eb8 --- /dev/null +++ b/docs/docs/configuration/gemini.md @@ -0,0 +1,65 @@ +--- +id: gemini +title: Google Gemini Descriptions +--- + +Google Gemini can be used to automatically generate descriptions based on the thumbnails of your events. This helps with [semantic search](/configuration/semantic_search) in Frigate by providing detailed text descriptions as a basis of the search query. Gemini Pro Vision has a free tier allowing [60 queries per minute](https://ai.google.dev/pricing) to the API, which is more than sufficient for standard Frigate usage. + +## Get API Key + +To start using Gemini, you must first get an API key from [Google AI Studio](https://makersuite.google.com). + +1. Accept the Terms of Service +2. Click "Get API Key" from the right hand navigation +3. Click "Create API key in new project" +4. Copy the API key for use in your config + +## Configuration + +Because Gemini is an external service that will be receiving thumbnails from Frigate, Gemini can be enabled for all cameras or only for specific cameras. + +You may either directly paste the API key in your configuration, or store it in an environment variable prefixed with `FRIGATE_`. + +```yaml +gemini: # <- enable Gemini for all cameras + enabled: True + api_key: "{FRIGATE_GEMINI_API_KEY}" + +cameras: + front_camera: ... + indoor_camera: + gemini: # <- disable Gemini for your indoor camera + enabled: False +``` + +### Custom Prompts + +Frigate sends both your thumbnail and a prompt to Gemini asking for it to generate a description. The default prompt is as follows: + +``` +Describe the {label} in this image with as much detail as possible. Do not describe the background. +``` + +:::tip + +Prompts can use variable replacement for `{label}`, `{sub_label}`, and `{camera}` to substitute information from the event as part of the prompt. + +::: + +You are also able to define custom prompts in your configuration. + +```yaml +gemini: + enabled: True + api_key: "{FRIGATE_GEMINI_API_KEY}" + prompt: "Describe the {label} in this image from the {camera} security camera." + object_prompts: + person: "Describe the main person in the image (gender, age, clothing, activity, etc). Do not include where the activity is occurring (sidewalk, concrete, driveway, etc). If delivering a package, include the company the package is from." + car: "Label the primary vehicle in the image with just the name of the company if it is a delivery vehicle, or the color make and model." +``` + +### Experiment with prompts + +[Google AI Studio](https://makersuite.google.com) also has a playground. Download a couple different thumbnails from Frigate and try new things in the playground to get descriptions to your liking before updating the prompt in Frigate. + +![Google AI Studio](/img/gemini.png) diff --git a/docs/docs/configuration/index.md b/docs/docs/configuration/index.md index 6c5478a2b..163ad4dae 100644 --- a/docs/docs/configuration/index.md +++ b/docs/docs/configuration/index.md @@ -47,6 +47,11 @@ onvif: password: "{FRIGATE_RTSP_PASSWORD}" ``` +```yaml +gemini: + api_key: "{FRIGATE_GEMINI_API_KEY}" +``` + ### Full configuration reference: :::caution @@ -428,6 +433,30 @@ snapshots: # Optional: quality of the encoded jpeg, 0-100 (default: shown below) quality: 70 +# Optional: Configuration for semantic search capability +semantic_search: + # Optional: Enable semantic search (default: shown below) + enabled: False + +# Optional: Configuration for Google Gemini generated event descriptions +# NOTE: This will send thumbnails over the internet to Google's LLM to generate +# descriptions. It can be overridden at the camera level to enhance privacy for +# indoor cameras. +gemini: + # Optional: Enable Google Gemini description generation (default: shown below) + enabled: False + # Optional: Override existing descriptions on events (default: shown below) + override_existing: False + # Required if enabled: API key can be generated at https://makersuite.google.com + api_key: "{FRIGATE_GEMINI_API_KEY}" + # Optional: The default prompt for generating descriptions. Can use replacement + # variables like "label", "sub_label", "camera" to make more dynamic. (default: shown below) + prompt: "Describe the {label} in this image with as much detail as possible. Do not describe the background." + # Optional: Object specific prompts to customize description results + # Format: {label}: {prompt} + object_prompts: + person: "My special person prompt." + # Optional: Restream configuration # Uses https://github.com/AlexxIT/go2rtc (v1.8.3) go2rtc: diff --git a/docs/docs/configuration/semantic_search.md b/docs/docs/configuration/semantic_search.md new file mode 100644 index 000000000..2ea22c0ef --- /dev/null +++ b/docs/docs/configuration/semantic_search.md @@ -0,0 +1,31 @@ +--- +id: semantic_search +title: Using Semantic Search +--- + +Semantic search works by embedding images and/or text into a vector representation identified by numbers. Frigate has support for two such models which both run locally: [OpenAI CLIP](https://openai.com/research/clip) and [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). Embeddings are then saved to a local instance of [ChromaDB](https://trychroma.com). + +## Configuration + +Semantic Search is a global configuration setting. + +```yaml +semantic_search: + enabled: True +``` + +### OpenAI CLIP + +This model is able to embed both images and text into the same vector space, which allows `image -> image` and `text -> image` similarity searches. Frigate uses this model on completed events to encode the thumbnail image and store it in Chroma. When searching events via text in the search box, frigate will perform a `text -> image` similarity search against this embedding. When clicking "FIND SIMILAR" next to an event, Frigate will perform an `image -> image` similarity search to retrieve the closest matching thumbnails. + +### all-MiniLM-L6-v2 + +This is a sentence embedding model that has been fine tuned on over 1 billion sentence pairs. This model is used to embed event descriptions and perform searches against them. Descriptions can be created and/or modified on the Events page when clicking on an event. See [the Gemini docs](/configuration/gemini.md) for more information on how to automatically generate event descriptions. + +## Usage Tips + +1. Semantic search is used in conjunction with the filters you are already familiar with. Use a combination of traditional filtering and semantic search for the best results. +2. The comparison between text and image embedding distances generally means that results matching `description` will appear first, even if a `thumbnail` embedding may be a better match. Play with the "Search Type" filter to help find what you are looking for. +3. Make your search language and tone closely match your descriptions. If you are using thumbnail search, phrase your query as an image caption. +4. Semantic search on thumbnails tends to return better results when matching large subjects that take up most of the frame. Small things like "cat" tend to not work well. +5. Experiment! Find an event and start typing keywords to see what works for you. diff --git a/docs/sidebars.js b/docs/sidebars.js index 5a71ebfab..bb357c2f6 100644 --- a/docs/sidebars.js +++ b/docs/sidebars.js @@ -29,6 +29,10 @@ module.exports = { "configuration/object_detectors", "configuration/audio_detectors", ], + "Semantic Search": [ + "configuration/semantic_search", + "configuration/gemini", + ], Cameras: [ "configuration/cameras", "configuration/record", @@ -61,7 +65,7 @@ module.exports = { ], "Frigate+": ["plus/index"], Troubleshooting: [ - "troubleshooting/faqs", + "troubleshooting/faqs", "troubleshooting/recordings", "troubleshooting/edgetpu", ], diff --git a/docs/static/img/gemini.png b/docs/static/img/gemini.png new file mode 100644 index 000000000..cce16441e Binary files /dev/null and b/docs/static/img/gemini.png differ diff --git a/frigate/config.py b/frigate/config.py index 4c5f510dc..d339fd910 100644 --- a/frigate/config.py +++ b/frigate/config.py @@ -687,7 +687,7 @@ class SemanticSearchConfig(FrigateBaseModel): class GeminiConfig(FrigateBaseModel): enabled: bool = Field(default=False, title="Enable Google Gemini captioning.") override_existing: bool = Field( - default=False, title="Override existing sub labels." + default=False, title="Override existing descriptions." ) api_key: str = Field(default="", title="Google AI Studio API Key.") prompt: str = Field(