Miscellaneous fixes (#23317)

* resolve global record.export.hwaccel_args to fix phantom camera override * auto-stop debug replay sessions after 12 hours * docs tweaks * add more tips to object classification docs * tweak language * Store hwaccel errors with timeout so it can retry * Add error logs for Intel GPU stats * add area --------- Co-authored-by: Nicolas Mowen <nickmowen213@gmail.com>
2026-06-21 03:41:55 +03:00 · 2026-05-27 10:19:11 -05:00 · 2026-05-27 10:19:11 -05:00 · 2858662be9
commit 2858662be9
parent 88f944fe81
9 changed files with 123 additions and 16 deletions
--- a/docs/docs/configuration/custom_classification/object_classification.md
+++ b/docs/docs/configuration/custom_classification/object_classification.md
@ -149,9 +149,16 @@ For more detail, see [Frigate Tip: Best Practices for Training Face and Custom C
 - **The wizard is just the starting point**: You don't need to find and label every class upfront. Missing classes will naturally appear in Recent Classifications, and those images tend to be more valuable because they represent new conditions and edge cases.
 - **Problem framing**: Keep classes visually distinct and relevant to the chosen object types.
 - **Preprocessing**: Ensure examples reflect object crops similar to Frigate's boxes; keep the subject centered.
- **Labels**: Keep label names short and consistent; include a `none` class if you plan to ignore uncertain predictions for sub labels.
+- **Crop size**: Aim for crops of at least 100×100 pixels (a 10,000 pixel area). Crops smaller than ~80×80 get stretched 3-7× by the model's 224×224 input resize and tend to collapse into a generic "blob" region of feature space where identity becomes unreliable. If most of your detections are small because the camera is far from the subject, consider repositioning the camera for closer crops.
 - **Class balance**: Aim to keep your largest class within ~3× the count of your smallest. Beyond that, the model becomes biased toward the dominant class and tends to default borderline predictions to it (the "everything looks like Buddy" failure mode).
 - **Threshold**: Tune `threshold` per model to reduce false assignments. Start at `0.8` and adjust based on validation.
 :::tip `none` works differently from named classes
 Named classes work best with visually uniform examples — every Buddy photo should look like Buddy. The `none` class needs the opposite: visual diversity across sizes, framings, and qualities, because at inference it has to absorb everything that isn't one of your named classes. Don't apply the same "only keep large, well-framed images" rule to `none` that you would to a named class. Mix in small crops, partial views, and false positives deliberately - otherwise the model has no signal for "small/ambiguous thing = not one of my known classes" and will force those crops into a named class by default.
 :::
 ## Debugging Classification Models
 To troubleshoot issues with object classification models, enable debug logging to see detailed information about classification attempts, scores, and consensus calculations.
--- a/docs/docs/configuration/review.md
+++ b/docs/docs/configuration/review.md
@ -139,7 +139,7 @@ The Review page also can show periods of motion that didn't produce a tracked ob
 The Motion Previews pane shows preview clips for periods of significant motion that did not produce a tracked object. It is useful for spotting things that motion detection picked up but object detection did not, which can help validate tuning or catch missed objects.
-On the <NavPath path="Review > Motion" /> page, click the 3-dots menu on a camera and choose **Motion Previews**. Each card represents a continuous range of motion-only activity and plays back the recorded preview for that range. A heatmap overlay dims areas of the frame with no motion so the moving regions stand out.
+On the <NavPath path="Review > Motion" /> page, click the kebab menu on a camera and choose **Motion Previews**. Each card represents a continuous range of motion-only activity and plays back the recorded preview for that range. A heatmap overlay dims areas of the frame with no motion so the moving regions stand out.
 The pane provides a few controls:
@ -153,7 +153,7 @@ Clicking a preview clip seeks the recording player to that timestamp so you can
 Motion Search lets you scan recorded footage for changes inside a region of interest you draw on the camera. Unlike Motion Previews, which surfaces what Frigate's motion detector flagged in real time, Motion Search re-analyzes the saved recordings, so it can find changes that were missed (for example, an object that appeared while motion detection was paused by `lightning_threshold`, or in a region that is normally motion-masked).
-To start a search, click the 3-dots menu on a camera in the <NavPath path="Review > Motion" /> page and choose **Motion Search**. In the dialog:
+To start a search, click the kebab menu on a camera in the <NavPath path="Review > Motion" /> page and choose **Motion Search**. In the dialog:
 1. Pick the camera and time range to scan.
 2. Draw a polygon on the camera frame to define the region of interest.
--- a/docs/docs/troubleshooting/dummy-camera.md
+++ b/docs/docs/troubleshooting/dummy-camera.md
@ -3,6 +3,8 @@ id: dummy-camera
 title: Analyzing Object Detection
 ---
 import NavPath from "@site/src/components/NavPath";
 Frigate provides several tools for investigating object detection and tracking behavior: reviewing recorded detections through the UI, using the built-in Debug Replay feature, and manually setting up a dummy camera for advanced scenarios.
 ## Reviewing Detections in the UI
@ -51,12 +53,25 @@ Only one replay session can be active at a time. If a session is already running
 :::
 ### Starting Debug Replay
 Debug Replay can be started from several places in the UI. The starting point determines the time range that gets replayed.
 - **History — Actions menu.** Navigate to <NavPath path="History > {camera}" />, open the **Actions** menu in the toolbar, and choose **Debug Replay**. From here you can pick a preset (**Last 1 Minute**, **Last 5 Minutes**), select a range directly on the timeline with **From Timeline**, or enter exact start and end times with **Custom**. This is the most flexible option and the best choice when you want to add padding around a detection. On mobile, the same options appear in the Actions drawer.
 - **History — Detail Stream event menu.** While viewing a review item in the Detail Stream, open the menu on a tracked object's event card and choose **Debug Replay**. The replay range is set automatically to that object's start and end times.
 - **Explore — search result menu.** From an Explore card, open the kebab menu and choose **Debug Replay**. The range is taken from the tracked object's lifecycle.
 - **Explore — Tracking Details Actions menu.** Open a tracked object's **Tracking Details** dialog, then choose **Debug Replay** from the Actions menu. Same automatic range as the search result menu.
 - **Exports — export card menu.** From <NavPath path="Exports" />, open the menu on an export and choose **Debug Replay** to loop the exported clip through the detection pipeline for the camera it was exported from.
 The Detail Stream, Explore, and Exports entry points use the underlying recording or export's bounds with a small amount of padding. This can be convenient for quick checks, but if a detection is short or you want extra "settle" time for motion and the detector, start the replay from the History Actions menu instead and widen the range manually.
 ### Variables to consider
 - The replay will not always produce identical results to the original run. Different frames may be selected on replay, which can change detections and tracking.
 - Motion detection depends on the exact frames used; small frame shifts can change motion regions and therefore what gets passed to the detector.
 - Object detection is not fully deterministic: models and post-processing can yield slightly different results across runs.
 - In cases where a detection is short and a replay may only be a small number of frames, it is recommended to manually add some padding before and after the detection so that the motion and object detectors have time to settle into the scene. Rather than starting Debug Replay from Explore, navigate to History for your camera, choose Debug Replay from the Actions menu, and click the "From Timeline" or "Custom" option.
 - The replay camera inherits the source camera's zones. Any automations that trigger on those zone names will fire for the replay camera as well. This can be helpful when debugging zone behavior, but may be unexpected. You can add a condition on the source camera's name in your automation if you want to exclude replay triggers.
 Treat the replay as a close approximation rather than an exact reproduction. Run multiple loops and examine the debug overlays and logs to understand the behavior.
--- a/frigate/api/fastapi_app.py
+++ b/frigate/api/fastapi_app.py
@ -1,3 +1,4 @@
 import asyncio
 import logging
 import re
 from typing import Optional
@ -36,7 +37,7 @@ from frigate.comms.event_metadata_updater import (
 from frigate.config import FrigateConfig
 from frigate.config.camera.updater import CameraConfigUpdatePublisher
 from frigate.config.profile_manager import ProfileManager
-from frigate.debug_replay import DebugReplayManager
+from frigate.debug_replay import DebugReplayManager, debug_replay_auto_stop_watchdog
 from frigate.embeddings import EmbeddingsContext
 from frigate.genai import GenAIClientManager
 from frigate.ptz.onvif import OnvifController
@ -116,6 +117,11 @@ def create_fastapi_app(
    @app.on_event("startup")
    async def startup():
        logger.info("FastAPI started")
        asyncio.create_task(
            debug_replay_auto_stop_watchdog(
                replay_manager, frigate_config, config_publisher
            )
        )
    # Rate limiter (used for login endpoint)
    if frigate_config.auth.failed_login_rate_limit is None:
--- a/frigate/config/config.py
+++ b/frigate/config/config.py
@ -680,6 +680,13 @@ class FrigateConfig(FrigateBaseModel):
        if self.ffmpeg.hwaccel_args == "auto":
            self.ffmpeg.hwaccel_args = auto_detect_hwaccel()
        # Resolve global export hwaccel_args so it matches the per-camera
        # resolution below. Without this, every camera reads as overriding
        # record.export.hwaccel_args because the global stays "auto" while
        # the camera value gets resolved to the actual args list.
        if self.record.export.hwaccel_args == "auto":
            self.record.export.hwaccel_args = self.ffmpeg.hwaccel_args
        # Populate global audio filters from listen. Existing user-defined
        # entries for labels not in listen are preserved but unused at runtime.
        if self.audio.filters is None:
--- a/frigate/debug_replay.py
+++ b/frigate/debug_replay.py
@ -5,6 +5,7 @@ frigate.jobs.debug_replay. This module owns only session presence
 (active), session metadata, and post-session cleanup.
 """
 import asyncio
 import logging
 import os
 import shutil
@ -40,6 +41,9 @@ from frigate.util.config import find_config_file
 logger = logging.getLogger(__name__)
 MAX_SESSION_DURATION_SECONDS = 12 * 60 * 60
 AUTO_STOP_CHECK_INTERVAL_SECONDS = 60
 class DebugReplayManager:
    """Owns the lifecycle pointers for a single debug replay session.
@ -58,6 +62,7 @@ class DebugReplayManager:
        self.clip_path: str | None = None
        self.start_ts: float | None = None
        self.end_ts: float | None = None
        self.session_started_at: float | None = None
        self._job_state_publisher = JobStatePublisher()
    @property
@ -83,6 +88,7 @@ class DebugReplayManager:
            self.start_ts = start_ts
            self.end_ts = end_ts
            self.clip_path = None
            self.session_started_at = time.time()
    def mark_session_ready(self, clip_path: str) -> None:
        """Record the on-disk clip path after the camera has been published."""
@ -104,6 +110,7 @@ class DebugReplayManager:
        self.clip_path = None
        self.start_ts = None
        self.end_ts = None
        self.session_started_at = None
    def publish_camera(
        self,
@ -351,3 +358,41 @@ def cleanup_replay_cameras() -> None:
            shutil.rmtree(REPLAY_DIR)
        except Exception as e:
            logger.error("Failed to remove replay cache directory: %s", e)
 async def debug_replay_auto_stop_watchdog(
    manager: DebugReplayManager,
    frigate_config: FrigateConfig,
    config_publisher: CameraConfigUpdatePublisher,
 ) -> None:
    """Auto-stop debug replay sessions that exceed MAX_SESSION_DURATION_SECONDS.
    Backstop against a session left running for days. The cap is intentionally
    generous so realistic tuning and overnight soak workflows aren't disrupted.
    """
    while True:
        try:
            await asyncio.sleep(AUTO_STOP_CHECK_INTERVAL_SECONDS)
            started_at = manager.session_started_at
            if not manager.active or started_at is None:
                continue
            if time.time() - started_at < MAX_SESSION_DURATION_SECONDS:
                continue
            replay_name = manager.replay_camera_name
            await asyncio.to_thread(
                manager.stop,
                frigate_config=frigate_config,
                config_publisher=config_publisher,
            )
            logger.info(
                "Debug replay auto-stopped after exceeding max session duration of %d hours: %s",
                MAX_SESSION_DURATION_SECONDS // 3600,
                replay_name,
            )
        except asyncio.CancelledError:
            raise
        except Exception:
            logger.exception("Error in debug replay auto-stop watchdog")
--- a/frigate/stats/emitter.py
+++ b/frigate/stats/emitter.py
@ -32,7 +32,7 @@ class StatsEmitter(threading.Thread):
        self.config = config
        self.stats_tracking = stats_tracking
        self.stop_event = stop_event
-        self.hwaccel_errors: list[str] = []
+        self.hwaccel_errors: dict[str, float] = {}
        self.stats_history: list[dict[str, Any]] = []
        # create communication for stats
--- a/frigate/stats/util.py
+++ b/frigate/stats/util.py
@ -1,6 +1,7 @@
 """Utilities for stats."""
 import asyncio
 import logging
 import os
 import shutil
 import time
@ -34,6 +35,10 @@ from frigate.util.services import (
 )
 from frigate.version import VERSION
 logger = logging.getLogger(__name__)
 HWACCEL_ERROR_COOLDOWN_SECONDS = 3600
 def get_latest_version(config: FrigateConfig) -> str:
    if not config.telemetry.version_check:
@ -167,7 +172,9 @@ def get_detector_stats(
 def get_processing_stats(
-    config: FrigateConfig, stats: dict[str, str], hwaccel_errors: list[str]
+    config: FrigateConfig,
    stats: dict[str, str],
    hwaccel_errors: dict[str, float],
 ) -> None:
    """Get stats for cpu / gpu."""
@ -206,7 +213,9 @@ async def set_bandwidth_stats(config: FrigateConfig, all_stats: dict[str, Any])
 async def set_gpu_stats(
-    config: FrigateConfig, all_stats: dict[str, Any], hwaccel_errors: list[str]
+    config: FrigateConfig,
    all_stats: dict[str, Any],
    hwaccel_errors: dict[str, float],
 ) -> None:
    """Parse GPUs from hwaccel args and use for stats."""
    hwaccel_args = []
@ -231,12 +240,16 @@ async def set_gpu_stats(
    stats: dict[str, dict] = {}
    intel_gpu_collected = False
    now = time.monotonic()
    for args in hwaccel_args:
-        if args in hwaccel_errors:
+        last_error = hwaccel_errors.get(args)
-            # known erroring args should automatically return as error
+        if last_error is not None:
-            stats["error-gpu"] = {"gpu": "", "mem": ""}
+            if now - last_error < HWACCEL_ERROR_COOLDOWN_SECONDS:
-        elif "cuvid" in args or "nvidia" in args:
+                continue
            hwaccel_errors.pop(args, None)
        if "cuvid" in args or "nvidia" in args:
            # nvidia GPU
            nvidia_usage = get_nvidia_gpu_stats()
@ -253,7 +266,7 @@ async def set_gpu_stats(
            else:
                stats["nvidia-gpu"] = {"vendor": "nvidia", "gpu": "", "mem": ""}
-                hwaccel_errors.append(args)
+                hwaccel_errors[args] = time.monotonic()
        elif "nvmpi" in args or "jetson" in args:
            # nvidia Jetson
            jetson_usage = get_jetson_stats()
@ -262,7 +275,7 @@ async def set_gpu_stats(
                stats["jetson-gpu"] = {"vendor": "nvidia", **jetson_usage}
            else:
                stats["jetson-gpu"] = {"vendor": "nvidia", "gpu": "", "mem": ""}
-                hwaccel_errors.append(args)
+                hwaccel_errors[args] = time.monotonic()
        elif "qsv" in args or ("vaapi" in args and not is_vaapi_amd_driver()):
            if not config.telemetry.stats.intel_gpu_stats:
                continue
@ -280,7 +293,7 @@ async def set_gpu_stats(
                        stats[name] = entry
                else:
                    stats["intel-gpu"] = {"vendor": "intel", "gpu": "", "mem": ""}
-                    hwaccel_errors.append(args)
+                    hwaccel_errors[args] = time.monotonic()
        elif "vaapi" in args:
            if not config.telemetry.stats.amd_gpu_stats:
                continue
@ -292,7 +305,7 @@ async def set_gpu_stats(
                stats["amd-vaapi"] = {"vendor": "amd", **amd_usage}
            else:
                stats["amd-vaapi"] = {"vendor": "amd", "gpu": "", "mem": ""}
-                hwaccel_errors.append(args)
+                hwaccel_errors[args] = time.monotonic()
        elif "preset-rk" in args:
            rga_usage = get_rockchip_gpu_stats()
@ -328,7 +341,9 @@ async def set_npu_usages(config: FrigateConfig, all_stats: dict[str, Any]) -> No
 def stats_snapshot(
-    config: FrigateConfig, stats_tracking: StatsTrackingTypes, hwaccel_errors: list[str]
+    config: FrigateConfig,
    stats_tracking: StatsTrackingTypes,
    hwaccel_errors: dict[str, float],
 ) -> dict[str, Any]:
    """Get a snapshot of the current stats that are being tracked."""
    camera_metrics = stats_tracking["camera_metrics"]
--- a/frigate/util/services.py
+++ b/frigate/util/services.py
@ -416,6 +416,11 @@ def get_intel_gpu_stats(
    snapshot_a = _read_intel_drm_fdinfo(target_pdev)
    if not snapshot_a:
        logger.warning(
            "Unable to collect Intel GPU stats: no DRM fdinfo entries found"
            "%s. Check that /proc is readable and the i915/xe driver is loaded",
            f" for pdev {target_pdev}" if target_pdev else "",
        )
        return None
    start = time.monotonic()
@ -424,6 +429,9 @@ def get_intel_gpu_stats(
    snapshot_b = _read_intel_drm_fdinfo(target_pdev)
    if not snapshot_b or elapsed_ns <= 0:
        logger.warning(
            "Unable to collect Intel GPU stats: second DRM fdinfo sample was empty"
        )
        return None
    def _new_engine_pct() -> dict[str, float]:
@ -464,6 +472,10 @@ def get_intel_gpu_stats(
        pid_pct[data_b["pid"]] = pid_pct.get(data_b["pid"], 0.0) + client_total
    if not per_pdev_engine_pct:
        logger.warning(
            "Unable to collect Intel GPU stats: no per-engine counters available "
            "(i915 requires kernel >= 5.19)"
        )
        return None
    names = intel_gpu_name_resolver.get_names()