mirror of
https://github.com/blakeblackshear/frigate.git
synced 2025-12-19 11:36:43 +03:00
Compare commits
1 Commits
4f322af577
...
86504dc33a
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
86504dc33a |
@ -81,5 +81,3 @@ librosa==0.11.*
|
|||||||
soundfile==0.13.*
|
soundfile==0.13.*
|
||||||
# DeGirum detector
|
# DeGirum detector
|
||||||
degirum == 0.16.*
|
degirum == 0.16.*
|
||||||
# Memory profiling
|
|
||||||
memray == 1.15.*
|
|
||||||
|
|||||||
@ -1,129 +0,0 @@
|
|||||||
---
|
|
||||||
id: memory
|
|
||||||
title: Memory Troubleshooting
|
|
||||||
---
|
|
||||||
|
|
||||||
Frigate includes built-in memory profiling using [memray](https://bloomberg.github.io/memray/) to help diagnose memory issues. This feature allows you to profile specific Frigate modules to identify memory leaks, excessive allocations, or other memory-related problems.
|
|
||||||
|
|
||||||
## Enabling Memory Profiling
|
|
||||||
|
|
||||||
Memory profiling is controlled via the `FRIGATE_MEMRAY_MODULES` environment variable. Set it to a comma-separated list of module names you want to profile:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export FRIGATE_MEMRAY_MODULES="frigate.review_segment_manager,frigate.capture"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Module Names
|
|
||||||
|
|
||||||
Frigate processes are named using a module-based naming scheme. Common module names include:
|
|
||||||
|
|
||||||
- `frigate.review_segment_manager` - Review segment processing
|
|
||||||
- `frigate.recording_manager` - Recording management
|
|
||||||
- `frigate.capture` - Camera capture processes (all cameras with this module name)
|
|
||||||
- `frigate.process` - Camera processing/tracking (all cameras with this module name)
|
|
||||||
- `frigate.output` - Output processing
|
|
||||||
- `frigate.audio_manager` - Audio processing
|
|
||||||
- `frigate.embeddings` - Embeddings processing
|
|
||||||
|
|
||||||
You can also specify the full process name (including camera-specific identifiers) if you want to profile a specific camera:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export FRIGATE_MEMRAY_MODULES="frigate.capture:front_door"
|
|
||||||
```
|
|
||||||
|
|
||||||
When you specify a module name (e.g., `frigate.capture`), all processes with that module prefix will be profiled. For example, `frigate.capture` will profile all camera capture processes.
|
|
||||||
|
|
||||||
## How It Works
|
|
||||||
|
|
||||||
1. **Binary File Creation**: When profiling is enabled, memray creates a binary file (`.bin`) in `/config/memray_reports/` that is updated continuously in real-time as the process runs.
|
|
||||||
|
|
||||||
2. **Automatic HTML Generation**: On normal process exit, Frigate automatically:
|
|
||||||
|
|
||||||
- Stops memray tracking
|
|
||||||
- Generates an HTML flamegraph report
|
|
||||||
- Saves it to `/config/memray_reports/<module_name>.html`
|
|
||||||
|
|
||||||
3. **Crash Recovery**: If a process crashes (SIGKILL, segfault, etc.), the binary file is preserved with all data up to the crash point. You can manually generate the HTML report from the binary file.
|
|
||||||
|
|
||||||
## Viewing Reports
|
|
||||||
|
|
||||||
### Automatic Reports
|
|
||||||
|
|
||||||
After a process exits normally, you'll find HTML reports in `/config/memray_reports/`. Open these files in a web browser to view interactive flamegraphs showing memory usage patterns.
|
|
||||||
|
|
||||||
### Manual Report Generation
|
|
||||||
|
|
||||||
If a process crashes or you want to generate a report from an existing binary file, you can manually create the HTML report:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
memray flamegraph /config/memray_reports/<module_name>.bin
|
|
||||||
```
|
|
||||||
|
|
||||||
This will generate an HTML file that you can open in your browser.
|
|
||||||
|
|
||||||
## Understanding the Reports
|
|
||||||
|
|
||||||
Memray flamegraphs show:
|
|
||||||
|
|
||||||
- **Memory allocations over time**: See where memory is being allocated in your code
|
|
||||||
- **Call stacks**: Understand the full call chain leading to allocations
|
|
||||||
- **Memory hotspots**: Identify functions or code paths that allocate the most memory
|
|
||||||
- **Memory leaks**: Spot patterns where memory is allocated but not freed
|
|
||||||
|
|
||||||
The interactive HTML reports allow you to:
|
|
||||||
|
|
||||||
- Zoom into specific time ranges
|
|
||||||
- Filter by function names
|
|
||||||
- View detailed allocation information
|
|
||||||
- Export data for further analysis
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
1. **Profile During Issues**: Enable profiling when you're experiencing memory issues, not all the time, as it adds some overhead.
|
|
||||||
|
|
||||||
2. **Profile Specific Modules**: Instead of profiling everything, focus on the modules you suspect are causing issues.
|
|
||||||
|
|
||||||
3. **Let Processes Run**: Allow processes to run for a meaningful duration to capture representative memory usage patterns.
|
|
||||||
|
|
||||||
4. **Check Binary Files**: If HTML reports aren't generated automatically (e.g., after a crash), check for `.bin` files in `/config/memray_reports/` and generate reports manually.
|
|
||||||
|
|
||||||
5. **Compare Reports**: Generate reports at different times to compare memory usage patterns and identify trends.
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### No Reports Generated
|
|
||||||
|
|
||||||
- Check that the environment variable is set correctly
|
|
||||||
- Verify the module name matches exactly (case-sensitive)
|
|
||||||
- Check logs for memray-related errors
|
|
||||||
- Ensure `/config/memray_reports/` directory exists and is writable
|
|
||||||
|
|
||||||
### Process Crashed Before Report Generation
|
|
||||||
|
|
||||||
- Look for `.bin` files in `/config/memray_reports/`
|
|
||||||
- Manually generate HTML reports using: `memray flamegraph <file>.bin`
|
|
||||||
- The binary file contains all data up to the crash point
|
|
||||||
|
|
||||||
### Reports Show No Data
|
|
||||||
|
|
||||||
- Ensure the process ran long enough to generate meaningful data
|
|
||||||
- Check that memray is properly installed (included by default in Frigate)
|
|
||||||
- Verify the process actually started and ran (check process logs)
|
|
||||||
|
|
||||||
## Example Usage
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Enable profiling for review and capture modules
|
|
||||||
export FRIGATE_MEMRAY_MODULES="frigate.review_segment_manager,frigate.capture"
|
|
||||||
|
|
||||||
# Start Frigate
|
|
||||||
# ... let it run for a while ...
|
|
||||||
|
|
||||||
# Check for reports
|
|
||||||
ls -lh /config/memray_reports/
|
|
||||||
|
|
||||||
# If a process crashed, manually generate report
|
|
||||||
memray flamegraph /config/memray_reports/frigate_capture_front_door.bin
|
|
||||||
```
|
|
||||||
|
|
||||||
For more information about memray and interpreting reports, see the [official memray documentation](https://bloomberg.github.io/memray/).
|
|
||||||
@ -131,7 +131,6 @@ const sidebars: SidebarsConfig = {
|
|||||||
"troubleshooting/recordings",
|
"troubleshooting/recordings",
|
||||||
"troubleshooting/gpu",
|
"troubleshooting/gpu",
|
||||||
"troubleshooting/edgetpu",
|
"troubleshooting/edgetpu",
|
||||||
"troubleshooting/memory",
|
|
||||||
],
|
],
|
||||||
Development: [
|
Development: [
|
||||||
"development/contributing",
|
"development/contributing",
|
||||||
|
|||||||
@ -2,6 +2,7 @@ import glob
|
|||||||
import logging
|
import logging
|
||||||
import os
|
import os
|
||||||
import shutil
|
import shutil
|
||||||
|
import time
|
||||||
import urllib.request
|
import urllib.request
|
||||||
import zipfile
|
import zipfile
|
||||||
from queue import Queue
|
from queue import Queue
|
||||||
@ -54,9 +55,6 @@ class MemryXDetector(DetectionApi):
|
|||||||
)
|
)
|
||||||
return
|
return
|
||||||
|
|
||||||
# Initialize stop_event as None, will be set later by set_stop_event()
|
|
||||||
self.stop_event = None
|
|
||||||
|
|
||||||
model_cfg = getattr(detector_config, "model", None)
|
model_cfg = getattr(detector_config, "model", None)
|
||||||
|
|
||||||
# Check if model_type was explicitly set by the user
|
# Check if model_type was explicitly set by the user
|
||||||
@ -365,44 +363,27 @@ class MemryXDetector(DetectionApi):
|
|||||||
def process_input(self):
|
def process_input(self):
|
||||||
"""Input callback function: wait for frames in the input queue, preprocess, and send to MX3 (return)"""
|
"""Input callback function: wait for frames in the input queue, preprocess, and send to MX3 (return)"""
|
||||||
while True:
|
while True:
|
||||||
# Check if shutdown is requested
|
|
||||||
if self.stop_event and self.stop_event.is_set():
|
|
||||||
logger.debug("[process_input] Stop event detected, returning None")
|
|
||||||
return None
|
|
||||||
try:
|
try:
|
||||||
# Wait for a frame from the queue with timeout to check stop_event periodically
|
# Wait for a frame from the queue (blocking call)
|
||||||
frame = self.capture_queue.get(block=True, timeout=0.5)
|
frame = self.capture_queue.get(
|
||||||
|
block=True
|
||||||
|
) # Blocks until data is available
|
||||||
|
|
||||||
return frame
|
return frame
|
||||||
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
# Silently handle queue.Empty timeouts (expected during normal operation)
|
logger.info(f"[process_input] Error processing input: {e}")
|
||||||
# Log any other unexpected exceptions
|
time.sleep(0.1) # Prevent busy waiting in case of error
|
||||||
if "Empty" not in str(type(e).__name__):
|
|
||||||
logger.warning(f"[process_input] Unexpected error: {e}")
|
|
||||||
# Loop continues and will check stop_event at the top
|
|
||||||
|
|
||||||
def receive_output(self):
|
def receive_output(self):
|
||||||
"""Retrieve processed results from MemryX output queue + a copy of the original frame"""
|
"""Retrieve processed results from MemryX output queue + a copy of the original frame"""
|
||||||
try:
|
connection_id = (
|
||||||
# Get connection ID with timeout
|
self.capture_id_queue.get()
|
||||||
connection_id = self.capture_id_queue.get(
|
|
||||||
block=True, timeout=1.0
|
|
||||||
) # Get the corresponding connection ID
|
) # Get the corresponding connection ID
|
||||||
detections = self.output_queue.get() # Get detections from MemryX
|
detections = self.output_queue.get() # Get detections from MemryX
|
||||||
|
|
||||||
return connection_id, detections
|
return connection_id, detections
|
||||||
|
|
||||||
except Exception as e:
|
|
||||||
# On timeout or stop event, return None
|
|
||||||
if self.stop_event and self.stop_event.is_set():
|
|
||||||
logger.debug("[receive_output] Stop event detected, exiting")
|
|
||||||
# Silently handle queue.Empty timeouts, they're expected during normal operation
|
|
||||||
elif "Empty" not in str(type(e).__name__):
|
|
||||||
logger.warning(f"[receive_output] Error receiving output: {e}")
|
|
||||||
|
|
||||||
return None, None
|
|
||||||
|
|
||||||
def post_process_yolonas(self, output):
|
def post_process_yolonas(self, output):
|
||||||
predictions = output[0]
|
predictions = output[0]
|
||||||
|
|
||||||
@ -850,19 +831,6 @@ class MemryXDetector(DetectionApi):
|
|||||||
f"{self.memx_model_type} is currently not supported for memryx. See the docs for more info on supported models."
|
f"{self.memx_model_type} is currently not supported for memryx. See the docs for more info on supported models."
|
||||||
)
|
)
|
||||||
|
|
||||||
def set_stop_event(self, stop_event):
|
|
||||||
"""Set the stop event for graceful shutdown."""
|
|
||||||
self.stop_event = stop_event
|
|
||||||
|
|
||||||
def shutdown(self):
|
|
||||||
"""Gracefully shutdown the MemryX accelerator"""
|
|
||||||
try:
|
|
||||||
if hasattr(self, "accl") and self.accl is not None:
|
|
||||||
self.accl.shutdown()
|
|
||||||
logger.info("MemryX accelerator shutdown complete")
|
|
||||||
except Exception as e:
|
|
||||||
logger.error(f"Error during MemryX shutdown: {e}")
|
|
||||||
|
|
||||||
def detect_raw(self, tensor_input: np.ndarray):
|
def detect_raw(self, tensor_input: np.ndarray):
|
||||||
"""Removed synchronous detect_raw() function so that we only use async"""
|
"""Removed synchronous detect_raw() function so that we only use async"""
|
||||||
return 0
|
return 0
|
||||||
|
|||||||
@ -43,7 +43,6 @@ class BaseLocalDetector(ObjectDetector):
|
|||||||
self,
|
self,
|
||||||
detector_config: BaseDetectorConfig = None,
|
detector_config: BaseDetectorConfig = None,
|
||||||
labels: str = None,
|
labels: str = None,
|
||||||
stop_event: MpEvent = None,
|
|
||||||
):
|
):
|
||||||
self.fps = EventsPerSecond()
|
self.fps = EventsPerSecond()
|
||||||
if labels is None:
|
if labels is None:
|
||||||
@ -61,10 +60,6 @@ class BaseLocalDetector(ObjectDetector):
|
|||||||
|
|
||||||
self.detect_api = create_detector(detector_config)
|
self.detect_api = create_detector(detector_config)
|
||||||
|
|
||||||
# If the detector supports stop_event, pass it
|
|
||||||
if hasattr(self.detect_api, "set_stop_event") and stop_event:
|
|
||||||
self.detect_api.set_stop_event(stop_event)
|
|
||||||
|
|
||||||
def _transform_input(self, tensor_input: np.ndarray) -> np.ndarray:
|
def _transform_input(self, tensor_input: np.ndarray) -> np.ndarray:
|
||||||
if self.input_transform:
|
if self.input_transform:
|
||||||
tensor_input = np.transpose(tensor_input, self.input_transform)
|
tensor_input = np.transpose(tensor_input, self.input_transform)
|
||||||
@ -245,10 +240,6 @@ class AsyncDetectorRunner(FrigateProcess):
|
|||||||
while not self.stop_event.is_set():
|
while not self.stop_event.is_set():
|
||||||
connection_id, detections = self._detector.async_receive_output()
|
connection_id, detections = self._detector.async_receive_output()
|
||||||
|
|
||||||
# Handle timeout case (queue.Empty) - just continue
|
|
||||||
if connection_id is None:
|
|
||||||
continue
|
|
||||||
|
|
||||||
if not self.send_times:
|
if not self.send_times:
|
||||||
# guard; shouldn't happen if send/recv are balanced
|
# guard; shouldn't happen if send/recv are balanced
|
||||||
continue
|
continue
|
||||||
@ -275,38 +266,21 @@ class AsyncDetectorRunner(FrigateProcess):
|
|||||||
|
|
||||||
self._frame_manager = SharedMemoryFrameManager()
|
self._frame_manager = SharedMemoryFrameManager()
|
||||||
self._publisher = ObjectDetectorPublisher()
|
self._publisher = ObjectDetectorPublisher()
|
||||||
self._detector = AsyncLocalObjectDetector(
|
self._detector = AsyncLocalObjectDetector(detector_config=self.detector_config)
|
||||||
detector_config=self.detector_config, stop_event=self.stop_event
|
|
||||||
)
|
|
||||||
|
|
||||||
for name in self.cameras:
|
for name in self.cameras:
|
||||||
self.create_output_shm(name)
|
self.create_output_shm(name)
|
||||||
|
|
||||||
t_detect = threading.Thread(target=self._detect_worker, daemon=False)
|
t_detect = threading.Thread(target=self._detect_worker, daemon=True)
|
||||||
t_result = threading.Thread(target=self._result_worker, daemon=False)
|
t_result = threading.Thread(target=self._result_worker, daemon=True)
|
||||||
t_detect.start()
|
t_detect.start()
|
||||||
t_result.start()
|
t_result.start()
|
||||||
|
|
||||||
try:
|
|
||||||
while not self.stop_event.is_set():
|
while not self.stop_event.is_set():
|
||||||
time.sleep(0.5)
|
time.sleep(0.5)
|
||||||
|
|
||||||
logger.info(
|
|
||||||
"Stop event detected, waiting for detector threads to finish..."
|
|
||||||
)
|
|
||||||
|
|
||||||
# Wait for threads to finish processing
|
|
||||||
t_detect.join(timeout=5)
|
|
||||||
t_result.join(timeout=5)
|
|
||||||
|
|
||||||
# Shutdown the AsyncDetector
|
|
||||||
self._detector.detect_api.shutdown()
|
|
||||||
|
|
||||||
self._publisher.stop()
|
self._publisher.stop()
|
||||||
except Exception as e:
|
logger.info("Exited async detection process...")
|
||||||
logger.error(f"Error during async detector shutdown: {e}")
|
|
||||||
finally:
|
|
||||||
logger.info("Exited Async detection process...")
|
|
||||||
|
|
||||||
|
|
||||||
class ObjectDetectProcess:
|
class ObjectDetectProcess:
|
||||||
@ -334,7 +308,7 @@ class ObjectDetectProcess:
|
|||||||
# if the process has already exited on its own, just return
|
# if the process has already exited on its own, just return
|
||||||
if self.detect_process and self.detect_process.exitcode:
|
if self.detect_process and self.detect_process.exitcode:
|
||||||
return
|
return
|
||||||
|
self.detect_process.terminate()
|
||||||
logging.info("Waiting for detection process to exit gracefully...")
|
logging.info("Waiting for detection process to exit gracefully...")
|
||||||
self.detect_process.join(timeout=30)
|
self.detect_process.join(timeout=30)
|
||||||
if self.detect_process.exitcode is None:
|
if self.detect_process.exitcode is None:
|
||||||
|
|||||||
@ -1,10 +1,7 @@
|
|||||||
import atexit
|
|
||||||
import faulthandler
|
import faulthandler
|
||||||
import logging
|
import logging
|
||||||
import multiprocessing as mp
|
import multiprocessing as mp
|
||||||
import os
|
import os
|
||||||
import pathlib
|
|
||||||
import subprocess
|
|
||||||
import threading
|
import threading
|
||||||
from logging.handlers import QueueHandler
|
from logging.handlers import QueueHandler
|
||||||
from multiprocessing.synchronize import Event as MpEvent
|
from multiprocessing.synchronize import Event as MpEvent
|
||||||
@ -51,7 +48,6 @@ class FrigateProcess(BaseProcess):
|
|||||||
|
|
||||||
def before_start(self) -> None:
|
def before_start(self) -> None:
|
||||||
self.__log_queue = frigate.log.log_listener.queue
|
self.__log_queue = frigate.log.log_listener.queue
|
||||||
self.__memray_tracker = None
|
|
||||||
|
|
||||||
def pre_run_setup(self, logConfig: LoggerConfig | None = None) -> None:
|
def pre_run_setup(self, logConfig: LoggerConfig | None = None) -> None:
|
||||||
os.nice(self.priority)
|
os.nice(self.priority)
|
||||||
@ -68,86 +64,3 @@ class FrigateProcess(BaseProcess):
|
|||||||
frigate.log.apply_log_levels(
|
frigate.log.apply_log_levels(
|
||||||
logConfig.default.value.upper(), logConfig.logs
|
logConfig.default.value.upper(), logConfig.logs
|
||||||
)
|
)
|
||||||
|
|
||||||
self._setup_memray()
|
|
||||||
|
|
||||||
def _setup_memray(self) -> None:
|
|
||||||
"""Setup memray profiling if enabled via environment variable."""
|
|
||||||
memray_modules = os.environ.get("FRIGATE_MEMRAY_MODULES", "")
|
|
||||||
|
|
||||||
if not memray_modules:
|
|
||||||
return
|
|
||||||
|
|
||||||
# Extract module name from process name (e.g., "frigate.capture:camera" -> "frigate.capture")
|
|
||||||
process_name = self.name
|
|
||||||
module_name = (
|
|
||||||
process_name.split(":")[0] if ":" in process_name else process_name
|
|
||||||
)
|
|
||||||
|
|
||||||
enabled_modules = [m.strip() for m in memray_modules.split(",")]
|
|
||||||
|
|
||||||
if module_name not in enabled_modules and process_name not in enabled_modules:
|
|
||||||
return
|
|
||||||
|
|
||||||
try:
|
|
||||||
import memray
|
|
||||||
|
|
||||||
reports_dir = pathlib.Path("/config/memray_reports")
|
|
||||||
reports_dir.mkdir(parents=True, exist_ok=True)
|
|
||||||
safe_name = (
|
|
||||||
process_name.replace(":", "_").replace("/", "_").replace("\\", "_")
|
|
||||||
)
|
|
||||||
|
|
||||||
binary_file = reports_dir / f"{safe_name}.bin"
|
|
||||||
|
|
||||||
self.__memray_tracker = memray.Tracker(str(binary_file))
|
|
||||||
self.__memray_tracker.__enter__()
|
|
||||||
|
|
||||||
# Register cleanup handler to stop tracking and generate HTML report
|
|
||||||
# atexit runs on normal exits and most signal-based terminations (SIGTERM, SIGINT)
|
|
||||||
# For hard kills (SIGKILL) or segfaults, the binary file is preserved for manual generation
|
|
||||||
atexit.register(self._cleanup_memray, safe_name, binary_file)
|
|
||||||
|
|
||||||
self.logger.info(
|
|
||||||
f"Memray profiling enabled for module {module_name} (process: {self.name}). "
|
|
||||||
f"Binary file (updated continuously): {binary_file}. "
|
|
||||||
f"HTML report will be generated on exit: {reports_dir}/{safe_name}.html. "
|
|
||||||
f"If process crashes, manually generate with: memray flamegraph {binary_file}"
|
|
||||||
)
|
|
||||||
except Exception as e:
|
|
||||||
self.logger.error(f"Failed to setup memray profiling: {e}", exc_info=True)
|
|
||||||
|
|
||||||
def _cleanup_memray(self, safe_name: str, binary_file: pathlib.Path) -> None:
|
|
||||||
"""Stop memray tracking and generate HTML report."""
|
|
||||||
if self.__memray_tracker is None:
|
|
||||||
return
|
|
||||||
|
|
||||||
try:
|
|
||||||
self.__memray_tracker.__exit__(None, None, None)
|
|
||||||
self.__memray_tracker = None
|
|
||||||
|
|
||||||
reports_dir = pathlib.Path("/config/memray_reports")
|
|
||||||
html_file = reports_dir / f"{safe_name}.html"
|
|
||||||
|
|
||||||
result = subprocess.run(
|
|
||||||
["memray", "flamegraph", "--output", str(html_file), str(binary_file)],
|
|
||||||
capture_output=True,
|
|
||||||
text=True,
|
|
||||||
timeout=10,
|
|
||||||
)
|
|
||||||
|
|
||||||
if result.returncode == 0:
|
|
||||||
self.logger.info(f"Memray report generated: {html_file}")
|
|
||||||
else:
|
|
||||||
self.logger.error(
|
|
||||||
f"Failed to generate memray report: {result.stderr}. "
|
|
||||||
f"Binary file preserved at {binary_file} for manual generation."
|
|
||||||
)
|
|
||||||
|
|
||||||
# Keep the binary file for manual report generation if needed
|
|
||||||
# Users can run: memray flamegraph {binary_file}
|
|
||||||
|
|
||||||
except subprocess.TimeoutExpired:
|
|
||||||
self.logger.error("Memray report generation timed out")
|
|
||||||
except Exception as e:
|
|
||||||
self.logger.error(f"Failed to cleanup memray profiling: {e}", exc_info=True)
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user