From 475ab146b4235f0841d21119ddf3491469bff594 Mon Sep 17 00:00:00 2001 From: Josh Hawkins <32435876+hawkeye217@users.noreply.github.com> Date: Mon, 1 Dec 2025 11:13:59 -0600 Subject: [PATCH] update transcription docs --- docs/docs/configuration/audio_detectors.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/docs/configuration/audio_detectors.md b/docs/docs/configuration/audio_detectors.md index 80b0727a5..f2ff99b6b 100644 --- a/docs/docs/configuration/audio_detectors.md +++ b/docs/docs/configuration/audio_detectors.md @@ -168,6 +168,8 @@ Recorded `speech` events will always use a `whisper` model, regardless of the `m If you hear speech that’s actually important and worth saving/indexing for the future, **just press the transcribe button in Explore** on that specific `speech` event - that keeps things explicit, reliable, and under your control. + Other options are being considered for future versions of Frigate to add transcription options that support external `whisper` Docker containers. A single transcription service could then be shared by Frigate and other applications (for example, Home Assistant Voice), and run on more powerful machines when available. + 2. Why don't you save live transcription text and use that for `speech` events? There’s no guarantee that a `speech` event is even created from the exact audio that went through the transcription model. Live transcription and `speech` event creation are **separate, asynchronous processes**. Even when both are correctly configured, trying to align the **precise start and end time of a speech event** with whatever audio the model happened to be processing at that moment is unreliable.