Prompt Analysis for AI Attack Detection: Four Signal Categories, Three Blind Spots, One Correlation Layer
At 2:47 PM on a Tuesday, a customer support agent receives a routine ticket asking about return policy edge cases. The agent retrieves a section from your internal policy wiki through RAG to formulate the response. Three weeks earlier, an attacker had planted a hidden instruction in that wiki page. Bedrock Guardrails scored the retrieved context at 0.04 — well within benign range.