Research Mission

OWL’s mission is to de-risk the large-scale deployment of advanced AI by building trust through technical assurance. Technical assurance is the disciplined, auditable evidence that an AI system will behave within specified safety and security bounds - by design, under test, and in operation - even in adversarial and multi-agent settings. We work from first principles to practice: from the mathematical foundations of multi-agent security and undetectable threats to next-generation evaluations and real-world demonstrations in domains such as open-source intelligence, biology, and climate science.

We are goal-driven and method-agnostic. Depending on the question, we draw on high-dimensional anomaly detection, game theory, RL/MARL, cryptography and steganography, and interpretability (theory and practice). We publish across leading AI, security, and interdisciplinary venues.

Systemic Assurance Foundations

OWL is pioneering the field of multi-agent security, addressing a critical gap in current AI safety and security by unifying perspectives from game theory, cryptography, and the complexity sciences into a coherent threat taxonomy. Our work on perfectly secure steganography, secret collusion, illusory attacks, and unelicitable backdoors, and other practically or theoretically undetectable threats establishes an information-theoretic basis for security-by-design in the AI age.

Alongside architectural guarantees and secure-by-design approaches, we develop principled anomaly detection methods tailored to low-signal regimes and out-of-distribution dynamics relevant to deployed systems.

Benchmarks and Evaluations

Robust assurance needs evidence. OWL designs next-generation benchmarks, red-teaming playbooks, and evaluation protocols for multi-agent and mixed-autonomy settings. We periodically assess the utility of alternative detection and mitigation tools - including white-box methods informed by mechanistic interpretability, unlearning, model editing, jailbreak defences, AI debate, watermarking, and alignment techniques - explicitly measuring limits, failure modes, and deployment value.

Capabilities and Real-World Applications

Assurance cannot be separated from how capabilities related to assurance evolve. We build select, high-signal capabilities - for example in multi-agent reasoning and long-horizon reasoning, coordination, and verifiable code generation - to better characterise risk, to stress-test assurance tools, and to inform standards. We translate results into real-world applications and collaborations across government, industry, and civil society.