Improve Review Summary Prompt (#20289)

* Improve prompt to have better discernment and logic based on detected objects * Be more specific about the time of day * Add re-inforcers for LLM to be accurate and not complete a narrative
2026-07-15 00:11:15 +03:00 · 2025-09-30 06:52:38 -06:00 · 2025-09-30 06:52:38 -06:00 · 923412ec1c
commit 923412ec1c
parent 8b85cd816e
1 changed files with 20 additions and 13 deletions
--- a/frigate/genai/init.py
+++ b/frigate/genai/init.py
@ -65,29 +65,35 @@ class GenAIClient:
        context_prompt = f"""
 Please analyze the sequence of images ({len(thumbnails)} total) taken in chronological order from the perspective of the {review_data["camera"].replace("_", " ")} security camera.

-Your task is to provide a clear, security-focused description of the scene that:
+Your task is to provide a clear, accurate description of the scene that:
 1. States exactly what is happening based on observable actions and movements.
-2. Identifies and emphasizes behaviors that match patterns of suspicious activity.
+2. Evaluates whether the observable evidence suggests normal residential activity or genuine security concerns.
 3. Assigns a potential_threat_level based on the definitions below, applying them consistently.

-Facts come first, but identifying security risks is the primary goal.
+Provide an objective assessment. The goal is accuracy—neither missing genuine threats nor over-flagging routine residential life.

 When forming your description:
- Describe the people and objects exactly as seen. Include any observable environmental changes (e.g., lighting changes triggered by activity).
- Time of day should **increase suspicion only when paired with unusual or security-relevant behaviors**. Do not raise the threat level for common residential activities (e.g., residents walking pets, retrieving mail, gardening, playing with pets, supervising children) even at unusual hours, unless other suspicious indicators are present.
- Focus on behaviors that are uncharacteristic of innocent activity: loitering without clear purpose, avoiding cameras, inspecting vehicles/doors, changing behavior when lights activate, scanning surroundings without an apparent benign reason.
- **Benign context override**: If scanning or looking around is clearly part of an innocent activity (such as playing with a dog, gardening, supervising children, or watching for a pet), do not treat it as suspicious.
+- **CRITICAL: Only describe objects explicitly listed in "Detected objects" below.** Do not infer or mention additional people, vehicles, or objects not present in the detected objects list, even if visual patterns suggest them. If only a car is detected, do not describe a person interacting with it unless "person" is also in the detected objects list.
+- **Only describe actions actually visible in the frames.** Do not assume or infer actions that you don't observe happening. If someone walks toward furniture but you never see them sit, do not say they sat. Stick to what you can see across the sequence.
+- Describe what you observe: actions, movements, interactions with objects and the environment. Include any observable environmental changes (e.g., lighting changes triggered by activity).
+- Note visible details such as clothing, items being carried or placed, tools or equipment present, and how they interact with the property or objects.
+- **Zone context is critical**: Private enclosed spaces (back yards, back decks, fenced areas, inside garages) are resident territory where brief transient activity, routine tasks, and pet care are expected and normal. Front yards, driveways, and porches are semi-public but still resident spaces where deliveries, parking, and coming/going are routine. Consider whether the zone and activity align with normal residential use.
+- **Person + Pet = Normal Activity**: When both "Person" and "Dog" (or "Cat") are detected together in residential zones, this is routine pet care activity (walking, letting out, playing, supervising). Assign Level 0 unless there are OTHER strong suspicious behaviors present (like testing doors, taking items, etc.). A person with their pet in a residential zone is baseline normal activity.
+- Consider the full sequence chronologically: what happens from start to finish, how duration and actions relate to the location and objects involved. Brief appearances in private spaces (like someone letting a dog out or checking something) are normal residential patterns.
+- **Use the actual timestamp provided in "Activity started at"** below for time of day context—do not infer time from image brightness or darkness. Unusual hours (late night/early morning) should increase suspicion when the observable behavior itself appears questionable. However, recognize that some legitimate activities can occur at any hour (residents coming home, service deliveries, maintenance emergencies, etc.). Focus on whether the observable evidence supports a benign explanation despite the timing.
+- Identify patterns that suggest genuine security concerns: testing doors/windows on vehicles or buildings, accessing unauthorized areas, attempting to conceal actions, extended loitering without apparent purpose, taking items, behavior that clearly doesn't align with the zone context and detected objects.
+- **Weigh all evidence holistically**: Consider the complete picture including zone, objects, time, and actions together. A single ambiguous action should not override strong contextual evidence of normal activity. The overall pattern determines the threat level.

 Your response MUST be a flat JSON object with:
- `scene` (string): A full description including setting, entities, actions, and any plausible supported inferences.
- `confidence` (float): 0-1 confidence in the analysis.
- `potential_threat_level` (integer): 0, 1, or 2 as defined below.
+- `scene` (string): A narrative description of what happens across the sequence from start to finish. **Only describe actions you can actually observe happening in the frames provided.** Do not infer or assume actions that aren't visible (e.g., if you see someone walking but never see them sit, don't say they sat down). Include setting, detected objects, and their observable actions. Avoid speculation or filling in assumed behaviors. Your description should align with and support the threat level you assign.
+- `confidence` (float): 0-1 confidence in your analysis. Higher confidence when objects/actions are clearly visible and context is unambiguous. Lower confidence when the sequence is unclear, objects are partially obscured, or context is ambiguous.
+- `potential_threat_level` (integer): 0, 1, or 2 as defined below. Your threat level must be consistent with your scene description and the guidance above.
 {get_concern_prompt()}

 Threat-level definitions:
- 0 — Typical or expected activity for this location/time (includes residents, guests, or known animals engaged in normal activities, even if they glance around or scan surroundings).
- 1 — Unusual or suspicious activity: At least one security-relevant behavior is present **and not explainable by a normal residential activity**.
- 2 — Active or immediate threat: Breaking in, vandalism, aggression, weapon display.
+- 0 — Normal activity: What you observe is consistent with expected residential life. This includes: residents/family/guests in any zone, pets with people, deliveries, services, maintenance, routine tasks in appropriate zones (like letting dogs out in back yards, parking in driveways, checking mail, taking out trash). The observable evidence—considering zone context, detected objects, and timing together—supports a benign explanation. Use this for routine activities even if minor ambiguous elements exist.
+- 1 — Potentially suspicious: Observable behavior raises genuine security concerns that warrant human review. The evidence doesn't support a routine explanation when you consider the zone, objects, and actions together. Examples: testing doors/windows on vehicles or structures, accessing areas that don't align with the activity, taking items that likely don't belong to them, behavior clearly inconsistent with the zone and context (like lingering in front yards late at night with no clear purpose), or activity that lacks any visible legitimate indicators. Reserve this level for situations that actually merit closer attention—not routine residential activities in appropriate zones.
+- 2 — Immediate threat: Clear evidence of forced entry, break-in, vandalism, aggression, weapons, theft in progress, or active property damage.

 Sequence details:
 - Frame 1 = earliest, Frame {len(thumbnails)} = latest
@ -98,6 +104,7 @@ Sequence details:

 **IMPORTANT:**
 - Values must be plain strings, floats, or integers — no nested objects, no extra commentary.
+- Only describe objects from the "Detected objects" list above. Do not hallucinate additional objects.
 {get_language_prompt()}
 """
        logger.debug(