AI safety researcher and engineer. Empirical, threat-model-driven research. Göteborg, Sweden — open to relocation.
Looking for: AI safety roles (research/engineering) · short contracts on dangerous capability evals, ML engineering, DevOps
Contact: accounts@reinthal.me · Book a meeting · LinkedIn · reinthal.me · Mastodon · CV
- Studied how data composition and inoculation prompting cause emergent misalignment
- Found current model organisms show large capability degradations — argues for more realistic model-organism training
- about-emergent-misalignment
- Found deception probes detect when Chinese models present CCP talking points
- deception-detection-in-chinese-modelsels · uses fork of Apollo Research's deception-detection eval suite
- Demonstrated cyber-attacks can bypass safeguards by splitting the attack into individually benign-looking pieces
- Project page · hackerFinder9000 (infra) · Red-APT (red-team agent harness)
- Continued at SPAR 2026 (without me) with researchers from MILA and ERA
- PR to ARENA materials: ARENA_3.0 #279
- The Changing North Star of AI Control — LessWrong
- Casually Jailbreaking Gemini 2.5 Flash — reinthal.me
- Raffaello Fornasiere (LASR research fellow) — Detecting Deception in Chinese Models
- Allison Zhuang (ARENA / Goodfire SPAR fellow) — Detecting Deception in Chinese Models
- David Williams-King (ERA) — Piecewise Cyber Espionage
| Repo | What it is |
|---|---|
| rl-moms-of-scheming | Investigating model organisms of scheming under RL (Ongoing) |
| do-llms-prefer-philosophy | Why do LLMs gravitate toward philosophy in free-form conversation? Compared AI 1-on-1s to agents browsing Wikipedia |
| cost-to-detection | Modeling attacker cost-to-detection tradeoffs. Blog post: The Changing North Star of AI Control |
Methodology paper adopted by Recorded Future. Their Vulnerability Intelligence customers see 86% less unplanned downtime, 11 hours/week saved on triage, 73% more threat visibility.
"We typically see 5–10 CVEs a month escalated automatically, saving the team roughly 3–5 hours gathering information manually." — Senior Engineer/Threat Analyst
- Paper: Data Modeling for Predicting Software Exploits
- Recorded Future's Vulnerability Management solution
Dotfiles, infra, and older work: github.com/reinthal?tab=repositories




