Methods before claims.
Token optimization is only worth doing if it holds quality. We are preparing the benchmarks, methods, and failure cases so teams can inspect how each technique behaves before adopting it.
BenchmarkIn progress
Quality parity under aggressive context trimming
How trimmed prompts hold up against full-context baselines across real coding tasks.
MethodIn progress
When routing to a lighter model is safe
A classifier for the cases where a smaller model matches the frontier one, and where it must not.
ReportIn progress
Where the tokens actually go in a coding agent
A breakdown of a long bug hunt: re-reads, repeated history, and heavy-model overuse.
Methods and datasets will ship alongside each post.