Compare commits

..

1 Commits

Author SHA1 Message Date
GuoQing Liu
86504dc33a
Merge 33048ebc01 into 8520ade5c4 2025-11-25 15:49:00 +00:00
6 changed files with 20 additions and 297 deletions

View File

@ -81,5 +81,3 @@ librosa==0.11.*
soundfile==0.13.* soundfile==0.13.*
# DeGirum detector # DeGirum detector
degirum == 0.16.* degirum == 0.16.*
# Memory profiling
memray == 1.15.*

View File

@ -1,129 +0,0 @@
---
id: memory
title: Memory Troubleshooting
---
Frigate includes built-in memory profiling using [memray](https://bloomberg.github.io/memray/) to help diagnose memory issues. This feature allows you to profile specific Frigate modules to identify memory leaks, excessive allocations, or other memory-related problems.
## Enabling Memory Profiling
Memory profiling is controlled via the `FRIGATE_MEMRAY_MODULES` environment variable. Set it to a comma-separated list of module names you want to profile:
```bash
export FRIGATE_MEMRAY_MODULES="frigate.review_segment_manager,frigate.capture"
```
### Module Names
Frigate processes are named using a module-based naming scheme. Common module names include:
- `frigate.review_segment_manager` - Review segment processing
- `frigate.recording_manager` - Recording management
- `frigate.capture` - Camera capture processes (all cameras with this module name)
- `frigate.process` - Camera processing/tracking (all cameras with this module name)
- `frigate.output` - Output processing
- `frigate.audio_manager` - Audio processing
- `frigate.embeddings` - Embeddings processing
You can also specify the full process name (including camera-specific identifiers) if you want to profile a specific camera:
```bash
export FRIGATE_MEMRAY_MODULES="frigate.capture:front_door"
```
When you specify a module name (e.g., `frigate.capture`), all processes with that module prefix will be profiled. For example, `frigate.capture` will profile all camera capture processes.
## How It Works
1. **Binary File Creation**: When profiling is enabled, memray creates a binary file (`.bin`) in `/config/memray_reports/` that is updated continuously in real-time as the process runs.
2. **Automatic HTML Generation**: On normal process exit, Frigate automatically:
- Stops memray tracking
- Generates an HTML flamegraph report
- Saves it to `/config/memray_reports/<module_name>.html`
3. **Crash Recovery**: If a process crashes (SIGKILL, segfault, etc.), the binary file is preserved with all data up to the crash point. You can manually generate the HTML report from the binary file.
## Viewing Reports
### Automatic Reports
After a process exits normally, you'll find HTML reports in `/config/memray_reports/`. Open these files in a web browser to view interactive flamegraphs showing memory usage patterns.
### Manual Report Generation
If a process crashes or you want to generate a report from an existing binary file, you can manually create the HTML report:
```bash
memray flamegraph /config/memray_reports/<module_name>.bin
```
This will generate an HTML file that you can open in your browser.
## Understanding the Reports
Memray flamegraphs show:
- **Memory allocations over time**: See where memory is being allocated in your code
- **Call stacks**: Understand the full call chain leading to allocations
- **Memory hotspots**: Identify functions or code paths that allocate the most memory
- **Memory leaks**: Spot patterns where memory is allocated but not freed
The interactive HTML reports allow you to:
- Zoom into specific time ranges
- Filter by function names
- View detailed allocation information
- Export data for further analysis
## Best Practices
1. **Profile During Issues**: Enable profiling when you're experiencing memory issues, not all the time, as it adds some overhead.
2. **Profile Specific Modules**: Instead of profiling everything, focus on the modules you suspect are causing issues.
3. **Let Processes Run**: Allow processes to run for a meaningful duration to capture representative memory usage patterns.
4. **Check Binary Files**: If HTML reports aren't generated automatically (e.g., after a crash), check for `.bin` files in `/config/memray_reports/` and generate reports manually.
5. **Compare Reports**: Generate reports at different times to compare memory usage patterns and identify trends.
## Troubleshooting
### No Reports Generated
- Check that the environment variable is set correctly
- Verify the module name matches exactly (case-sensitive)
- Check logs for memray-related errors
- Ensure `/config/memray_reports/` directory exists and is writable
### Process Crashed Before Report Generation
- Look for `.bin` files in `/config/memray_reports/`
- Manually generate HTML reports using: `memray flamegraph <file>.bin`
- The binary file contains all data up to the crash point
### Reports Show No Data
- Ensure the process ran long enough to generate meaningful data
- Check that memray is properly installed (included by default in Frigate)
- Verify the process actually started and ran (check process logs)
## Example Usage
```bash
# Enable profiling for review and capture modules
export FRIGATE_MEMRAY_MODULES="frigate.review_segment_manager,frigate.capture"
# Start Frigate
# ... let it run for a while ...
# Check for reports
ls -lh /config/memray_reports/
# If a process crashed, manually generate report
memray flamegraph /config/memray_reports/frigate_capture_front_door.bin
```
For more information about memray and interpreting reports, see the [official memray documentation](https://bloomberg.github.io/memray/).

View File

@ -131,7 +131,6 @@ const sidebars: SidebarsConfig = {
"troubleshooting/recordings", "troubleshooting/recordings",
"troubleshooting/gpu", "troubleshooting/gpu",
"troubleshooting/edgetpu", "troubleshooting/edgetpu",
"troubleshooting/memory",
], ],
Development: [ Development: [
"development/contributing", "development/contributing",

View File

@ -2,6 +2,7 @@ import glob
import logging import logging
import os import os
import shutil import shutil
import time
import urllib.request import urllib.request
import zipfile import zipfile
from queue import Queue from queue import Queue
@ -54,9 +55,6 @@ class MemryXDetector(DetectionApi):
) )
return return
# Initialize stop_event as None, will be set later by set_stop_event()
self.stop_event = None
model_cfg = getattr(detector_config, "model", None) model_cfg = getattr(detector_config, "model", None)
# Check if model_type was explicitly set by the user # Check if model_type was explicitly set by the user
@ -365,44 +363,27 @@ class MemryXDetector(DetectionApi):
def process_input(self): def process_input(self):
"""Input callback function: wait for frames in the input queue, preprocess, and send to MX3 (return)""" """Input callback function: wait for frames in the input queue, preprocess, and send to MX3 (return)"""
while True: while True:
# Check if shutdown is requested
if self.stop_event and self.stop_event.is_set():
logger.debug("[process_input] Stop event detected, returning None")
return None
try: try:
# Wait for a frame from the queue with timeout to check stop_event periodically # Wait for a frame from the queue (blocking call)
frame = self.capture_queue.get(block=True, timeout=0.5) frame = self.capture_queue.get(
block=True
) # Blocks until data is available
return frame return frame
except Exception as e: except Exception as e:
# Silently handle queue.Empty timeouts (expected during normal operation) logger.info(f"[process_input] Error processing input: {e}")
# Log any other unexpected exceptions time.sleep(0.1) # Prevent busy waiting in case of error
if "Empty" not in str(type(e).__name__):
logger.warning(f"[process_input] Unexpected error: {e}")
# Loop continues and will check stop_event at the top
def receive_output(self): def receive_output(self):
"""Retrieve processed results from MemryX output queue + a copy of the original frame""" """Retrieve processed results from MemryX output queue + a copy of the original frame"""
try: connection_id = (
# Get connection ID with timeout self.capture_id_queue.get()
connection_id = self.capture_id_queue.get(
block=True, timeout=1.0
) # Get the corresponding connection ID ) # Get the corresponding connection ID
detections = self.output_queue.get() # Get detections from MemryX detections = self.output_queue.get() # Get detections from MemryX
return connection_id, detections return connection_id, detections
except Exception as e:
# On timeout or stop event, return None
if self.stop_event and self.stop_event.is_set():
logger.debug("[receive_output] Stop event detected, exiting")
# Silently handle queue.Empty timeouts, they're expected during normal operation
elif "Empty" not in str(type(e).__name__):
logger.warning(f"[receive_output] Error receiving output: {e}")
return None, None
def post_process_yolonas(self, output): def post_process_yolonas(self, output):
predictions = output[0] predictions = output[0]
@ -850,19 +831,6 @@ class MemryXDetector(DetectionApi):
f"{self.memx_model_type} is currently not supported for memryx. See the docs for more info on supported models." f"{self.memx_model_type} is currently not supported for memryx. See the docs for more info on supported models."
) )
def set_stop_event(self, stop_event):
"""Set the stop event for graceful shutdown."""
self.stop_event = stop_event
def shutdown(self):
"""Gracefully shutdown the MemryX accelerator"""
try:
if hasattr(self, "accl") and self.accl is not None:
self.accl.shutdown()
logger.info("MemryX accelerator shutdown complete")
except Exception as e:
logger.error(f"Error during MemryX shutdown: {e}")
def detect_raw(self, tensor_input: np.ndarray): def detect_raw(self, tensor_input: np.ndarray):
"""Removed synchronous detect_raw() function so that we only use async""" """Removed synchronous detect_raw() function so that we only use async"""
return 0 return 0

View File

@ -43,7 +43,6 @@ class BaseLocalDetector(ObjectDetector):
self, self,
detector_config: BaseDetectorConfig = None, detector_config: BaseDetectorConfig = None,
labels: str = None, labels: str = None,
stop_event: MpEvent = None,
): ):
self.fps = EventsPerSecond() self.fps = EventsPerSecond()
if labels is None: if labels is None:
@ -61,10 +60,6 @@ class BaseLocalDetector(ObjectDetector):
self.detect_api = create_detector(detector_config) self.detect_api = create_detector(detector_config)
# If the detector supports stop_event, pass it
if hasattr(self.detect_api, "set_stop_event") and stop_event:
self.detect_api.set_stop_event(stop_event)
def _transform_input(self, tensor_input: np.ndarray) -> np.ndarray: def _transform_input(self, tensor_input: np.ndarray) -> np.ndarray:
if self.input_transform: if self.input_transform:
tensor_input = np.transpose(tensor_input, self.input_transform) tensor_input = np.transpose(tensor_input, self.input_transform)
@ -245,10 +240,6 @@ class AsyncDetectorRunner(FrigateProcess):
while not self.stop_event.is_set(): while not self.stop_event.is_set():
connection_id, detections = self._detector.async_receive_output() connection_id, detections = self._detector.async_receive_output()
# Handle timeout case (queue.Empty) - just continue
if connection_id is None:
continue
if not self.send_times: if not self.send_times:
# guard; shouldn't happen if send/recv are balanced # guard; shouldn't happen if send/recv are balanced
continue continue
@ -275,38 +266,21 @@ class AsyncDetectorRunner(FrigateProcess):
self._frame_manager = SharedMemoryFrameManager() self._frame_manager = SharedMemoryFrameManager()
self._publisher = ObjectDetectorPublisher() self._publisher = ObjectDetectorPublisher()
self._detector = AsyncLocalObjectDetector( self._detector = AsyncLocalObjectDetector(detector_config=self.detector_config)
detector_config=self.detector_config, stop_event=self.stop_event
)
for name in self.cameras: for name in self.cameras:
self.create_output_shm(name) self.create_output_shm(name)
t_detect = threading.Thread(target=self._detect_worker, daemon=False) t_detect = threading.Thread(target=self._detect_worker, daemon=True)
t_result = threading.Thread(target=self._result_worker, daemon=False) t_result = threading.Thread(target=self._result_worker, daemon=True)
t_detect.start() t_detect.start()
t_result.start() t_result.start()
try:
while not self.stop_event.is_set(): while not self.stop_event.is_set():
time.sleep(0.5) time.sleep(0.5)
logger.info(
"Stop event detected, waiting for detector threads to finish..."
)
# Wait for threads to finish processing
t_detect.join(timeout=5)
t_result.join(timeout=5)
# Shutdown the AsyncDetector
self._detector.detect_api.shutdown()
self._publisher.stop() self._publisher.stop()
except Exception as e: logger.info("Exited async detection process...")
logger.error(f"Error during async detector shutdown: {e}")
finally:
logger.info("Exited Async detection process...")
class ObjectDetectProcess: class ObjectDetectProcess:
@ -334,7 +308,7 @@ class ObjectDetectProcess:
# if the process has already exited on its own, just return # if the process has already exited on its own, just return
if self.detect_process and self.detect_process.exitcode: if self.detect_process and self.detect_process.exitcode:
return return
self.detect_process.terminate()
logging.info("Waiting for detection process to exit gracefully...") logging.info("Waiting for detection process to exit gracefully...")
self.detect_process.join(timeout=30) self.detect_process.join(timeout=30)
if self.detect_process.exitcode is None: if self.detect_process.exitcode is None:

View File

@ -1,10 +1,7 @@
import atexit
import faulthandler import faulthandler
import logging import logging
import multiprocessing as mp import multiprocessing as mp
import os import os
import pathlib
import subprocess
import threading import threading
from logging.handlers import QueueHandler from logging.handlers import QueueHandler
from multiprocessing.synchronize import Event as MpEvent from multiprocessing.synchronize import Event as MpEvent
@ -51,7 +48,6 @@ class FrigateProcess(BaseProcess):
def before_start(self) -> None: def before_start(self) -> None:
self.__log_queue = frigate.log.log_listener.queue self.__log_queue = frigate.log.log_listener.queue
self.__memray_tracker = None
def pre_run_setup(self, logConfig: LoggerConfig | None = None) -> None: def pre_run_setup(self, logConfig: LoggerConfig | None = None) -> None:
os.nice(self.priority) os.nice(self.priority)
@ -68,86 +64,3 @@ class FrigateProcess(BaseProcess):
frigate.log.apply_log_levels( frigate.log.apply_log_levels(
logConfig.default.value.upper(), logConfig.logs logConfig.default.value.upper(), logConfig.logs
) )
self._setup_memray()
def _setup_memray(self) -> None:
"""Setup memray profiling if enabled via environment variable."""
memray_modules = os.environ.get("FRIGATE_MEMRAY_MODULES", "")
if not memray_modules:
return
# Extract module name from process name (e.g., "frigate.capture:camera" -> "frigate.capture")
process_name = self.name
module_name = (
process_name.split(":")[0] if ":" in process_name else process_name
)
enabled_modules = [m.strip() for m in memray_modules.split(",")]
if module_name not in enabled_modules and process_name not in enabled_modules:
return
try:
import memray
reports_dir = pathlib.Path("/config/memray_reports")
reports_dir.mkdir(parents=True, exist_ok=True)
safe_name = (
process_name.replace(":", "_").replace("/", "_").replace("\\", "_")
)
binary_file = reports_dir / f"{safe_name}.bin"
self.__memray_tracker = memray.Tracker(str(binary_file))
self.__memray_tracker.__enter__()
# Register cleanup handler to stop tracking and generate HTML report
# atexit runs on normal exits and most signal-based terminations (SIGTERM, SIGINT)
# For hard kills (SIGKILL) or segfaults, the binary file is preserved for manual generation
atexit.register(self._cleanup_memray, safe_name, binary_file)
self.logger.info(
f"Memray profiling enabled for module {module_name} (process: {self.name}). "
f"Binary file (updated continuously): {binary_file}. "
f"HTML report will be generated on exit: {reports_dir}/{safe_name}.html. "
f"If process crashes, manually generate with: memray flamegraph {binary_file}"
)
except Exception as e:
self.logger.error(f"Failed to setup memray profiling: {e}", exc_info=True)
def _cleanup_memray(self, safe_name: str, binary_file: pathlib.Path) -> None:
"""Stop memray tracking and generate HTML report."""
if self.__memray_tracker is None:
return
try:
self.__memray_tracker.__exit__(None, None, None)
self.__memray_tracker = None
reports_dir = pathlib.Path("/config/memray_reports")
html_file = reports_dir / f"{safe_name}.html"
result = subprocess.run(
["memray", "flamegraph", "--output", str(html_file), str(binary_file)],
capture_output=True,
text=True,
timeout=10,
)
if result.returncode == 0:
self.logger.info(f"Memray report generated: {html_file}")
else:
self.logger.error(
f"Failed to generate memray report: {result.stderr}. "
f"Binary file preserved at {binary_file} for manual generation."
)
# Keep the binary file for manual report generation if needed
# Users can run: memray flamegraph {binary_file}
except subprocess.TimeoutExpired:
self.logger.error("Memray report generation timed out")
except Exception as e:
self.logger.error(f"Failed to cleanup memray profiling: {e}", exc_info=True)