Signal & Thread

Research

We investigate whether AI systems know what they know — measuring confidence, self-monitoring and change with the methods of clinical psychology and signal detection theory, so that deployment decisions rest on evidence rather than benchmark scores.

Research areas: MetacognitionEvaluation methodologyPsychophysicsFine-tuning

Metacognition & self-monitoring

Whether models know what they know: confidence validity, keep-or-withdraw behaviour, and self-monitoring mapped across domains and model families.

Evaluation methodology

How testing itself can mislead: positional artefacts, prompted underperformance, and change detection that survives the version treadmill.

Psychophysics of representations

What a century of perception science reveals inside model internals: magnitude, noise and category boundaries.

Fine-tuning for reliability

Making open-weight models more reliable at specific tasks: tuning that restores usable confidence, re-graded with the same public tests.

Publications

DATECATEGORYTITLE
May 2026MetacognitionDomain-Level Metacognitive Monitoring in Frontier LLMs: A 33-Model AtlasSelf-knowledge mapped across 33 models and six domains: strongest on applied knowledge, weakest on formal reasoning. May 2026Fine-tuningMaking LLMs Say What They Know: Probe-Targeted Fine-TuningThe tuning method behind stage 05: teaching open-weight models to say what they already know. May 2026MetacognitionThinking Mode Induces Confidence Compression in Reasoning LLMsReasoning modes improve answers while compressing the confidence signal that oversight relies on. Apr 2026MethodologyBeyond the Mean: Within-Model Reliable Change DetectionThe version question: averages hide item-level change. Clinical change statistics find it. Apr 2026MethodologyInstruction Complexity Induces Positional Collapse in Adversarial EvaluationComplicated adversarial instructions collapse models into positional answering. Apr 2026MethodologyOption-Order Randomisation Reveals a Distributional Position AttractorThe control that exposes sandbagging as a content-blind position habit. Apr 2026MethodologyBelow-Chance Blindness: Prompted Underperformance in Small LLMsModels told to underperform leak it as positional bias. A detectable tell. Apr 2026Fine-tuningDistilling Self-Consistency into Verbal ConfidenceA negative result, reported as one, and the post-hoc rescue that compressed ten samples into a single pass. Apr 2026MetacognitionVerbal Confidence Saturation in 3–9B Open-Weight Instruction-Tuned LLMsSeven small open-weight models, all with confidence stuck near the ceiling. None interpretable. Apr 2026MetacognitionCross-Entropy Is Load-Bearing: A Scope Test of the K-Way Energy ProbeThe follow-up: the training objective, not the inference dynamics, drives the gap. Apr 2026MetacognitionConcurrent Criterion Validation of a Validity Screen via Selective PredictionModels that pass the screen are safer when allowed to act on confidence. Models that fail are worse than a coin flip. Apr 2026MetacognitionScreen Before You Interpret: A Portable Validity ProtocolThe screen, made portable: it runs on any benchmark or task, from a single table of results. Apr 2026MetacognitionBefore You Interpret the Profile: Validity Scaling for LLM Metacognitive Self-ReportSix checks from clinical assessment that tell you whether a model’s confidence means anything at all. Apr 2026MetacognitionThe Metacognitive Monitoring BatteryA 524-question battery that asks models to keep or withdraw, bet or decline. The most accurate models are often the worst at knowing their limits. Apr 2026MetacognitionK-Way Energy Probes Reduce to Softmax in Discriminative Predictive CodingA probe that looked clever turns out to be softmax in disguise. Apr 2026MetacognitionQuantisation Reshapes the Metacognitive Geometry of Language ModelsCompressing a model reshapes where it knows its limits, even while headline rankings look stable. Apr 2026PsychophysicsSame Geometry, Opposite NoiseWhere biological magnitude gets noisier as it grows, transformer representations get steadier. Mar 2026PsychophysicsCategorical Perception in LLM Hidden StatesDigit boundaries warp a model’s internal space the way categories warp ours. Mar 2026MetacognitionDo LLMs Know What They Know? Metacognitive Efficiency with SDTThe formal frame: metacognitive efficiency measured with signal detection theory across 224,000 trials. Mar 2026PsychophysicsWeber’s Law in Transformer Magnitude RepresentationsNumber representations follow a century-old law of perception, in geometry though not in behaviour. Mar 2026MetacognitionLLMs as Signal Detectors: Sensitivity, Bias, and the Temperature–Criterion AnalogyWhat temperature really does: more sensitivity and a shifted decision criterion at once.

Dates are arXiv submission months. Published with open code and data; pre-registered where noted. These are single-author preprints, and we describe them that way. Plain-language lines are ours; the papers say it precisely. Author’s page: synthiumjp.github.io.

Put the instruments to work.

Commission the methods, re-run them yourself, or watch the screen run on a live model.

Request a technical briefing

Lab notes, by email.

Occasional findings, no marketing. Unsubscribe any time.