About
Serre AI is an independent research lab focused on the formal foundations of AI reasoning. We prove theorems about what language models can and cannot do.
Our work spans three areas: the computational complexity of LLM reasoning, the theory of verifying model outputs, and empirical evaluation at scale. We publish at top venues and release our benchmarks, data, and analysis code.
Researcher
Oddur Sigurdsson
Based in Reykjavik. Background in software engineering and computational theory. Building Serre AI as a solo research operation with autonomous infrastructure.
oddur@serre.ai
Methodology
We run research using autonomous AI agents built on Claude Code. Our platform orchestrates literature surveys, experiment design and execution, paper drafting, and internal peer review — all as structured agent sessions with quality gates, budget controls, and formal verification.
This isn't AI-assisted writing. The agents operate autonomously within defined research protocols: pre-registered experiments, knowledge graph consistency checks, and multi-agent review cycles. Human oversight is strategic, not operational.
The platform itself — an autonomous research orchestrator — is a research artifact. We study what works and what breaks when AI agents do science.
Research Areas
Computational Complexity of LLM Reasoning
Formal characterization of when and why language models fail at reasoning tasks, mapped to complexity classes. What is the relationship between a task's computational difficulty and a model's accuracy?
Verification Theory
When can we efficiently check whether a model's output is correct? Complexity-theoretic bounds on verification, with applications to scalable oversight and cross-model consistency.
Benchmark Design
Building evaluation frameworks with known computational structure. Tasks with provable ground truth, controlled difficulty scaling, and formal guarantees about what they measure.