Introduction For the last few years, security teams have been told to “trust but verify” large language models (LLMs) that they couldn’t meaningfully inspect. We’ve had prompt logs, guardrails, model cards, and red‑team reports; but very little visibility into how these systems actually make decisions internally. From a defender’s perspective, most models have been a…