← Home

Post-hoc reasoning in chain of thought

Steering results across models and tasks

I started this project during the training phase of Neel Nanda's MATS 6.0 stream. Later, with the help of some collaborators (Darius Kianersi and Adrià Garriga-Alonso), I extended it with a more comprehensive evaluation suite and some additional experiments. Over time this has resulted in three different artifacts for the same project: a blog post on this site, a LessWrong post, and a paper. I had GPT 5.5 Deep Research aggregate citations across the three artifacts, which I list below.

In the future, if you reference this project, I recommend you cite the paper on arXiv! I've provided a BibTeX citation below.

Cited by

10 works · as of June 2026

Papers

Other

Cite this project

@misc{cox2026decoding,
  title         = {Decoding Answers Before Chain-of-Thought: Evidence from Pre-CoT Probes and Activation Steering},
  author        = {Cox, Kyle and Kianersi, Darius and Garriga-Alonso, Adrià},
  year          = {2026},
  eprint        = {2603.01437},
  archivePrefix = {arXiv},
}