From da1cac5e81272cd76edee39ee0b8d19c572d6bd8 Mon Sep 17 00:00:00 2001 From: cat101 Date: Sun, 17 Sep 2023 10:29:07 -0300 Subject: [PATCH] Update docs/docs/guides/video_pipeline.md Co-authored-by: Nicolas Mowen --- docs/docs/guides/video_pipeline.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/docs/docs/guides/video_pipeline.md b/docs/docs/guides/video_pipeline.md index 90b1af6a3..4dcaa28e5 100644 --- a/docs/docs/guides/video_pipeline.md +++ b/docs/docs/guides/video_pipeline.md @@ -21,8 +21,13 @@ flowchart LR Motion --> Recording Object --> Recording ``` -As the diagram shows, all feeds first need to be acquired. Depending on the data source, it may be as simple as using FFmpeg to connect to an RTSP source via TCP or something more involved like connecting to an Apple Homekit camera using go2rtc. A single camera can produce a main (i.e. high quality) and a sub (i.e. low quality) video feed. Typically, the sub-feed will be decoded to produce full-frame images. As part of this process, the resolution may be downscaled and an image sampling frequency may be imposed (e.g. keep 5 frames per second). These frames will then be compared over time to detect movement areas (a.k.a. motion boxes). Once a box reaches a "significant size" to contain an object, it will be analyzed by a machine learning model to detect known objects. Finally, depending on the configuration, we will decide what video clips and events should be saved, what alarms should be triggered, etc. -### Detailed view of the video pipeline +As the diagram shows, all feeds first need to be acquired. Depending on the data source, it may be as simple as using FFmpeg to connect to an RTSP source via TCP or something more involved like connecting to an Apple Homekit camera using go2rtc. A single camera can produce a main (i.e. high resolution) and a sub (i.e. lower resolution) video feed. + +Typically, the sub-feed will be decoded to produce full-frame images. As part of this process, the resolution may be downscaled and an image sampling frequency may be imposed (e.g. keep 5 frames per second). + +These frames will then be compared over time to detect movement areas (a.k.a. motion boxes). These motion boxes are combined into motion regions and are analyzed by a machine learning model to detect known objects. Finally, the snapshot and recording retention config will decide what video clips and events should be saved. + +## Detailed view of the video pipeline The following diagram adds a lot more detail than the simple view explained before. The goal is to show the detailed data paths between the processing steps.