---
id: genai_config
title: Configuring Generative AI
---

import ConfigTabs from "@site/src/components/ConfigTabs";
import TabItem from "@theme/TabItem";
import NavPath from "@site/src/components/NavPath";

## Configuration

A Generative AI provider can be configured in the global config, which will make the Generative AI features available for use. There are currently 4 native providers available to integrate with Frigate. Other providers that support the OpenAI standard API can also be used. See the OpenAI-Compatible section below.

To use Generative AI, you must define a single provider at the global level of your Frigate configuration. If the provider you choose requires an API key, you may either directly paste it in your configuration, or store it in an environment variable prefixed with `FRIGATE_`.

## Local Providers

Local providers run on your own hardware and keep all data processing private. These require a GPU or dedicated hardware for best performance.

:::warning

Running Generative AI models on CPU is not recommended, as high inference times make using Generative AI impractical.

:::

### Recommended Local Models

You must use a vision-capable model with Frigate. The following models are recommended for local deployment:

| Model         | Notes                                                                                                                                                                |
| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `qwen3-vl`    | Strong visual and situational understanding, enhanced ability to identify smaller objects and interactions with object.                                              |
| `qwen3.5`     | Strong situational understanding, but missing DeepStack from qwen3-vl leading to worse performance for identifying objects in people's hand and other small details. |
| `gemma4`      | Strong situational understanding, sometimes resorts to more vague terms like 'interacts' instead of assigning a specific action.                                     |
| `Intern3.5VL` | Relatively fast with good vision comprehension                                                                                                                       |
| `gemma3`      | Slower model with good vision and temporal understanding                                                                                                             |

:::info

Each model is available in multiple parameter sizes (3b, 4b, 8b, etc.). Larger sizes are more capable of complex tasks and understanding of situations, but requires more memory and computational resources. It is recommended to try multiple models and experiment to see which performs best.

:::

:::note

You should have at least 8 GB of RAM available (or VRAM if running on GPU) to run the 7B models, 16 GB to run the 13B models, and 24 GB to run the 33B models.

:::

### Model Types: Instruct vs Thinking

Vision-language models come in **instruct** variants (fine-tuned to follow instructions and respond concisely), **thinking** variants (fine-tuned for free-form, speculative reasoning), and **hybrid** variants that support both modes per request. Most modern vision-language models are hybrid.

Frigate manages reasoning per task automatically:

- **Description tasks** (object descriptions, review descriptions, review summaries) are synthesis-only and benefit from concise, direct output, so Frigate disables thinking for these calls when the model exposes a per-request toggle.
- **Chat** lets you toggle thinking on or off from the composer when the configured model supports it.

You can use a pure instruct, hybrid, or thinking-capable model with Frigate — no extra configuration is required to disable thinking for descriptions.

### llama.cpp

[llama.cpp](https://github.com/ggml-org/llama.cpp) is a C++ implementation of LLaMA that provides a high-performance inference server.

It is highly recommended to host the llama.cpp server on a machine with a discrete graphics card, or on an Apple silicon Mac for best performance.

#### Supported Models

You must use a vision capable model with Frigate. The llama.cpp server supports various vision models in GGUF format.

#### Configuration

All llama.cpp native options can be passed through `provider_options`, including `temperature`, `top_k`, `top_p`, `min_p`, `repeat_penalty`, `repeat_last_n`, `seed`, `grammar`, and more. See the [llama.cpp server documentation](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for a complete list of available parameters.

<ConfigTabs>
<TabItem value="ui">

1. Navigate to <NavPath path="Settings > Enrichments > Generative AI" />.
   - Set **Provider** to `llamacpp`
   - Set **Base URL** to your llama.cpp server address (e.g., `http://localhost:8080`)
   - Set **Model** to the name of your model
   - Under **Provider Options**, set `context_size` to tell Frigate your context size so it can send the appropriate amount of information

</TabItem>
<TabItem value="yaml">

```yaml
genai:
  provider: llamacpp
  base_url: http://localhost:8080
  model: your-model-name
  provider_options:
    context_size: 16000 # Tell Frigate your context size so it can send the appropriate amount of information.
```

</TabItem>
</ConfigTabs>

### Ollama

[Ollama](https://ollama.com/) allows you to self-host large language models and keep everything running locally. It is highly recommended to host this server on a machine with an Nvidia graphics card, or on a Apple silicon Mac for best performance.

Most of the 7b parameter 4-bit vision models will fit inside 8GB of VRAM. There is also a [Docker container](https://hub.docker.com/r/ollama/ollama) available.

Parallel requests also come with some caveats. You will need to set `OLLAMA_NUM_PARALLEL=1` and choose a `OLLAMA_MAX_QUEUE` and `OLLAMA_MAX_LOADED_MODELS` values that are appropriate for your hardware and preferences. See the [Ollama documentation](https://docs.ollama.com/faq#how-does-ollama-handle-concurrent-requests).

:::tip

If you are trying to use a single model for Frigate and HomeAssistant, it will need to support vision and tools calling. qwen3-VL supports vision and tools simultaneously in Ollama.

:::

Note that Frigate will not automatically download the model you specify in your config. Ollama will try to download the model but it may take longer than the timeout, so it is recommended to pull the model beforehand by running `ollama pull your_model` on your Ollama server/Docker container. The model specified in Frigate's config must match the downloaded model tag.

#### Configuration

<ConfigTabs>
<TabItem value="ui">

1. Navigate to <NavPath path="Settings > Enrichments > Generative AI" />.
   - Set **Provider** to `ollama`
   - Set **Base URL** to your Ollama server address (e.g., `http://localhost:11434`)
   - Set **Model** to the model tag (e.g., `qwen3-vl:4b`)
   - Under **Provider Options**, set `keep_alive` (e.g., `-1`) and `options.num_ctx` to match your desired context size

</TabItem>
<TabItem value="yaml">

```yaml
genai:
  provider: ollama
  base_url: http://localhost:11434
  model: qwen3-vl:4b
  provider_options: # other Ollama client options can be defined
    keep_alive: -1
    options:
      num_ctx: 8192 # make sure the context matches other services that are using ollama
```

</TabItem>
</ConfigTabs>

### OpenAI-Compatible

Frigate supports any provider that implements the OpenAI API standard. This includes self-hosted solutions like [vLLM](https://docs.vllm.ai/), [LocalAI](https://localai.io/), and other OpenAI-compatible servers.

:::tip

For OpenAI-compatible servers (such as llama.cpp) that don't expose the configured context size in the API response, you can manually specify the context size in `provider_options`:

```yaml
genai:
  provider: openai
  base_url: http://your-llama-server
  model: your-model-name
  provider_options:
    context_size: 8192 # Specify the configured context size
```

This ensures Frigate uses the correct context window size when generating prompts.

:::

#### Configuration

<ConfigTabs>
<TabItem value="ui">

1. Navigate to <NavPath path="Settings > Enrichments > Generative AI" />.
   - Set **Provider** to `openai`
   - Set **Base URL** to your server address (e.g., `http://your-server:port`)
   - Set **API key** if required by your server
   - Set **Model** to the model name

</TabItem>
<TabItem value="yaml">

```yaml
genai:
  provider: openai
  base_url: http://your-server:port
  api_key: your-api-key # May not be required for local servers
  model: your-model-name
```

</TabItem>
</ConfigTabs>

To use a different OpenAI-compatible API endpoint, set the `OPENAI_BASE_URL` environment variable to your provider's API URL.

## Cloud Providers

Cloud providers run on remote infrastructure and require an API key for authentication. These services handle all model inference on their servers.

:::info

Cloud Generative AI providers require an active internet connection to send images and prompts for processing. Local providers like llama.cpp and Ollama (with local models) do not require internet. See [Network Requirements](/frigate/network_requirements#generative-ai) for details.

:::

### Ollama Cloud

Ollama also supports [cloud models](https://ollama.com/cloud), where model inference is performed in the cloud. You can connect directly to Ollama Cloud by setting `base_url` to `https://ollama.com` and providing an API key. Alternatively, you can run Ollama locally and use a cloud model name so your local instance forwards requests to the cloud. For more details, see the Ollama cloud model [docs](https://docs.ollama.com/cloud).

#### Configuration

<ConfigTabs>
<TabItem value="ui">

1. Navigate to <NavPath path="Settings > Enrichments > Generative AI" />.
   - Set **Provider** to `ollama`
   - Set **Base URL** to your local Ollama address (e.g., `http://localhost:11434`) or `https://ollama.com` for direct cloud inference
   - Set **API key** if required by your endpoint (e.g., when using `https://ollama.com`)
   - Set **Model** to the cloud model name

</TabItem>
<TabItem value="yaml">

```yaml
genai:
  provider: ollama
  base_url: http://localhost:11434
  model: cloud-model-name
```

or when using Ollama Cloud directly

```yaml
genai:
  provider: ollama
  base_url: https://ollama.com
  model: cloud-model-name
  api_key: your-api-key
```

</TabItem>
</ConfigTabs>

### Google Gemini

Google Gemini has a [free tier](https://ai.google.dev/pricing) for the API, however the limits may not be sufficient for standard Frigate usage. Choose a plan appropriate for your installation.

#### Supported Models

You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://ai.google.dev/gemini-api/docs/models/gemini).

#### Get API Key

To start using Gemini, you must first get an API key from [Google AI Studio](https://aistudio.google.com).

1. Accept the Terms of Service
2. Click "Get API Key" from the right hand navigation
3. Click "Create API key in new project"
4. Copy the API key for use in your config

#### Configuration

<ConfigTabs>
<TabItem value="ui">

1. Navigate to <NavPath path="Settings > Enrichments > Generative AI" />.
   - Set **Provider** to `gemini`
   - Set **API key** to your Gemini API key (or use an environment variable such as `{FRIGATE_GEMINI_API_KEY}`)
   - Set **Model** to the desired model (e.g., `gemini-2.5-flash`)

</TabItem>
<TabItem value="yaml">

```yaml
genai:
  provider: gemini
  api_key: "{FRIGATE_GEMINI_API_KEY}"
  model: gemini-2.5-flash
```

</TabItem>
</ConfigTabs>

:::note

To use a different Gemini-compatible API endpoint, set the `provider_options` with the `base_url` key to your provider's API URL. For example:

```yaml {4,5}
genai:
  provider: gemini
  ...
  provider_options:
    base_url: https://...
```

Other HTTP options are available, see the [python-genai documentation](https://github.com/googleapis/python-genai).

:::

### OpenAI

OpenAI does not have a free tier for their API. With the release of gpt-4o, pricing has been reduced and each generation should cost fractions of a cent if you choose to go this route.

#### Supported Models

You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://platform.openai.com/docs/models).

#### Get API Key

To start using OpenAI, you must first [create an API key](https://platform.openai.com/api-keys) and [configure billing](https://platform.openai.com/settings/organization/billing/overview).

#### Configuration

<ConfigTabs>
<TabItem value="ui">

1. Navigate to <NavPath path="Settings > Enrichments > Generative AI" />.
   - Set **Provider** to `openai`
   - Set **API key** to your OpenAI API key (or use an environment variable such as `{FRIGATE_OPENAI_API_KEY}`)
   - Set **Model** to the desired model (e.g., `gpt-4o`)

</TabItem>
<TabItem value="yaml">

```yaml
genai:
  provider: openai
  api_key: "{FRIGATE_OPENAI_API_KEY}"
  model: gpt-4o
```

</TabItem>
</ConfigTabs>

:::note

To use a different OpenAI-compatible API endpoint, set the `OPENAI_BASE_URL` environment variable to your provider's API URL.

:::

:::tip

For OpenAI-compatible servers (such as llama.cpp) that don't expose the configured context size in the API response, you can manually specify the context size in `provider_options`:

```yaml {5,6}
genai:
  provider: openai
  base_url: http://your-llama-server
  model: your-model-name
  provider_options:
    context_size: 8192 # Specify the configured context size
```

This ensures Frigate uses the correct context window size when generating prompts.

:::

### Azure OpenAI

Microsoft offers several vision models through Azure OpenAI. A subscription is required.

#### Supported Models

You must use a vision capable model with Frigate. Current model variants can be found [in their documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models).

#### Create Resource and Get API Key

To start using Azure OpenAI, you must first [create a resource](https://learn.microsoft.com/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#create-a-resource). You'll need your API key, model name, and resource URL, which must include the `api-version` parameter (see the example below).

#### Configuration

<ConfigTabs>
<TabItem value="ui">

1. Navigate to <NavPath path="Settings > Enrichments > Generative AI" />.
   - Set **Provider** to `azure_openai`
   - Set **Base URL** to your Azure resource URL including the `api-version` parameter (e.g., `https://instance.cognitiveservices.azure.com/openai/responses?api-version=2025-04-01-preview`)
   - Set **Model** to your deployed model name (e.g., `gpt-5-mini`)
   - Set **API key** to your Azure OpenAI API key (or use an environment variable such as `{FRIGATE_OPENAI_API_KEY}`)

</TabItem>
<TabItem value="yaml">

```yaml
genai:
  provider: azure_openai
  base_url: https://instance.cognitiveservices.azure.com/openai/responses?api-version=2025-04-01-preview
  model: gpt-5-mini
  api_key: "{FRIGATE_OPENAI_API_KEY}"
```

</TabItem>
</ConfigTabs>