index

Hello! I’m Jacob Merizian. I work at the UK AI Security Institute. In the past, I’ve done research in high-performance computing, language model pretraining, interpretability, and hardware enabled governance.

writing

Dec 2025 - Auditing games for sandbagging detection (arxiv)
Nov 2025 - Evaluating Opus 4.5 for misalignment
- Including “external testing” section of Opus 4.5 system card
Jul 2025 - UK AISI Whitebox Control Progress Update (AF, LW)
Jul 2025 - Establishing Best Practices for Building Rigorous Agentic Benchmarks (HN, neurips)
Nov 2022 - Interpreting Neural Networks through the Polytope Lens (LW)
Dec 2021 - GPT-3 knows about its evals

projects

Mar 2025 - SWE-Bench Visualizer
Sep 2021 - City Circuits: a GPT-2 “neuron explorer”