// topic / research-papers

Research papers

Notable arXiv drops, replication notes, and what the field is actually learning vs. claiming.

All items 85 items ← back to archive

DEEPCOGITO.COM·

Deep Cogito v2: open-source models that internalize their own reasoning

San Francisco startup founded by ex-Googlers ships four open-source hybrid reasoning models — 70B, 109B, 405B, 671B — using a technique called Iterated Distillation and Amplification (IDA) to distill search-time reasoning back into model weights.

open-source · research
ARXIV 2504 / VLM RESEARCH·2026-05-22

Chain-of-Modality prompting — Vision-Language Models progressively integrate modalities to refine manipulation plans from human demonstration video

An arXiv paper titled 'Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models' (arXiv 2504.13351) introduces a prompting strategy where Vision Language Models progressively integrate information from each modality to refine task plans for robotic manipulation. The structural innovation is that the methodology works without retraining — it's a prompting protocol that elicits multimodal reasoning from existing VLMs.

research-papers · multimodal
ANTHROPIC / OPENAI / INDUSTRY·2026-05-22

DPO has supplanted RLHF as the default frontier alignment method — the 2026 safety-research stack moves from preference modeling to direct optimization

Industry consensus by May 2026 places Direct Preference Optimization (DPO) as the default alignment training method across frontier labs, replacing the more complex RLHF pipeline that dominated through 2025. The shift is structural: DPO requires less compute, fewer human-in-the-loop annotations, and produces more interpretable preference gradients. Combined with the rise of process-reward models and constitutional self-critique loops, frontier alignment has materially simplified.

alignment · research
MEDRXIV / BIASMEDQA·2026-05-22

LLM reasoning does not protect against clinical cognitive biases — BiasMedQA shows reasoning chains carry the same anchoring failures as direct answers

A medical-AI evaluation paper using the BiasMedQA benchmark finds that LLM reasoning chains do not protect models from clinical cognitive biases (anchoring, availability, confirmation). Reasoning-tier models fall into the same diagnostic-bias patterns as direct-answer models — sometimes more confidently, because the reasoning chain provides surface-level justification for the biased outcome.

research-papers · safety
ARXIV / MECHANISTIC INTERPRETABILITY REVIEW·2026-05-22

Mechanistic Interpretability for AI Safety — the field-defining review consolidates 2024-2026 methodology into a single reference text

An updated 'Mechanistic Interpretability for AI Safety — A Review' (arXiv 2404.14082) consolidates the 2024-2026 methodology pipeline — circuit identification, feature differentials, sparse autoencoder methods, and behavioral attribution — into the field's reference text. The review's publication this week, during the postponed-EO ambiguity, gives both AISI and lab-internal teams a single citation surface for methodology discussions.

interpretability · research
ARXIV 2510 / MIT CSAIL·2026-05-22

MultiModal Action Conditioned Video Generation — MIT CSAIL paper opens fine-grained multimodal control beyond text-to-video

An MIT CSAIL paper by Yichen Li and Antonio Torralba (arXiv 2510.02287) introduces a multimodal action-conditioned video generation approach that captures proprioception, kinesthesia, force haptics, and muscle activation as control signals. The architecture lets users condition video generation on fine-grained physical interaction signals rather than just text prompts — a meaningful step beyond the Sora/Veo/Kling text-to-video pattern.

multimodal · research-papers
ARXIV 2605·2026-05-22

Lifting Traces to Logic — programmatic skill induction with neuro-symbolic learning targets long-horizon agentic tasks

A new arXiv paper titled 'Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks' proposes a methodology for extracting reusable program-like skills from neural reasoning traces and re-using them across agentic workflows. The result is a step toward closing the gap between transformer-style reasoning (broad but expensive) and symbolic planning (narrow but cheap).

research-papers · architecture
ARXIV 2511 / ROBOT PLANNING·2026-05-22

Long-context Q-Former integrated with Multimodal LLM — robot confirmation and action planning gets a context-spanning attention pattern

An arXiv paper titled 'Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM' (arXiv 2511.17335) proposes a long-context Q-former architecture incorporating left-right context dependency in full videos, plus a text-conditioning approach that feeds text embeddings directly into the LLM decoder. The combination produces more reliable confirmation generation and action planning for long-horizon manipulation tasks.

research-papers · robotics
SOURCE·2026-05-22

The VLM-robotics stack emerges — Chain-of-Modality, long-context Q-former, and action-conditioned video sketch the 2027 architecture

Three papers, one trajectory. Chain-of-Modality elicits multimodal reasoning from existing VLMs without retraining. Long-context Q-Former retains temporal coherence across long-horizon tasks. Action-conditioned video extends conditioning to physical control signals. The 2026 H1 research trajectory points at a coherent 2027 robotics-AI architecture.

analysis · research-papers
DEEP COGITO·2026-05-21

Deep Cogito v2 ships 70B/109B/405B/671B open-weight family with Iterated Distillation & Amplification self-improvement loop

Deep Cogito's v2 release ships four open-weight sizes (70B, 109B, 405B, 671B) wired into an Iterated Distillation & Amplification (IDA) self-improvement loop. The release positions IDA as a deployable architecture rather than a research curiosity — the first open-weight family where the "model improves itself between checkpoints" methodology is shipped as the default training recipe.

open-source · frontier-models · research
ALIGNMENT RESEARCH·2026-05-21

Direct Preference Optimization quietly replaces RLHF at the frontier — simpler pipeline, equivalent capability, cheaper to iterate

Direct Preference Optimization (DPO) has now displaced RLHF at the frontier across multiple labs. The shift is methodological rather than headline-grabbing: DPO removes the separate reward-model training stage, treats the preference data directly as the optimization signal, and produces comparable alignment outcomes with roughly half the engineering complexity.

alignment · research
ARXIV / INTERPRETABILITY·2026-05-21

New arXiv work on decoding encrypted chain-of-thought reasoning — latent-reasoning models pose new monitorability challenge

Recent arXiv work (Dec 2025–May 2026) introduces a model organism for opaque internal reasoning and proposes unsupervised decoding of encrypted chain-of-thought. The research direction responds to a frontier-safety problem: as more frontier labs explore latent-reasoning models that don't externalize CoT in human language, the standard CoT-monitorability assumption breaks.

interpretability · alignment · research
OPENAI / SAWIN·2026-05-21

The Erdős unit-distance proof becomes a methodology case study — Princeton's Sawin refinement opens the door for auditing AI math

OpenAI's Erdős unit-distance result, paired with Princeton's Will Sawin refinement showing δ ≥ 0.014, has become a methodology test-case for how AI-generated mathematics gets audited and refined by human mathematicians. The collaboration model — AI produces the construction and proof, human researcher tightens the bound — is the first concrete demonstration of the human-plus-AI mathematics workflow at research-frontier scale.

research-papers · math
ARXIV / ROBOTICS RESEARCH·2026-05-21

Interleaved vision-language reasoning traces unlock long-horizon robot manipulation in unseen environments

A new arXiv paper, "Thinking in Text and Images: Interleaved Vision-Language Reasoning Traces for Long-Horizon Robot Manipulation," shows that interleaving language and image tokens in the reasoning trace produces materially better generalization on long-horizon manipulation tasks in unseen environments. The technique scales to the kind of task class that home-robot deployment requires.

research-papers · robotics
MIT TECHNOLOGY REVIEW·2026-05-21

Mechanistic interpretability named one of MIT Tech Review's 10 Breakthrough Technologies of 2026

Mechanistic interpretability — the program of reverse-engineering neural-network computations into human-understandable algorithms — has been named one of MIT Technology Review's 10 Breakthrough Technologies of 2026. The recognition formalizes what frontier labs have been signaling for two years: interpretability is no longer a research-niche but a structural safety pillar.

interpretability · research
OPENAI·2026-05-21

OpenAI's general-purpose reasoning model autonomously disproves an 80-year-old Erdős conjecture in discrete geometry

OpenAI announced that one of its general-purpose reasoning models autonomously disproved a central conjecture in discrete geometry — the planar unit-distance problem posed by Paul Erdős in 1946. The model found a new family of point configurations beating the square-grid arrangement and produced a mathematical proof. A subsequent refinement by Princeton's Will Sawin showed δ ≥ 0.014 is achievable from the construction.

frontier-models · research · math
AI DAILY POST / PAPER TREND ANALYSIS·2026-05-21

Top 2026 LLM papers continue Pass@k efficiency theme — solving problems with fewer attempts is the year's dominant research direction

A trend analysis of the top-cited 2026 LLM papers confirms Pass@k efficiency as the year's dominant research direction. Where 2024–2025 emphasized capability ceilings (can the model solve the problem at all?), 2026 papers are converging on efficiency frontiers (can the model solve it on the first or second attempt?). The shift reflects inference-cost reality across the deployed frontier.

research-papers · architecture
ARXIV 2605.06241·2026-05-21

New arXiv work argues RL for LLM reasoning is sparse policy selection, not capability learning — only 1-3% of tokens shift

An arXiv paper out this month — 'Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning' — finds that RL fine-tuning of frontier reasoning models affects only 1-3% of token positions, and that the promoted tokens nearly always lie within the base model's top-5 alternatives. The result reframes 'reasoning models' as base models with sparsely-modified token-selection policies, not as models with new reasoning capability.

alignment · research
ARXIV 2605.02073·2026-05-21

Search-driven reward-function optimization paper shows GRPO can be improved by treating the reward spec itself as the optimization target

A May arXiv paper, 'Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning,' shows that treating the reward function as an optimization object — generating candidate rewards with a frontier LLM, validating them automatically, and screening through GRPO training runs — produces materially better reasoning gains than fixed-reward training. The pipeline is roughly 30% more sample-efficient than baseline GRPO.

research-papers · architecture
ARXIV·2026-05-20

"Attention as Binding" paper formalizes transformer reasoning as approximate Vector Symbolic Architecture

A new arXiv paper, "Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning," interprets self-attention and residual streams as implementing an approximate Vector Symbolic Architecture (VSA). The framing provides a unified theoretical account for why transformers can do compositional reasoning — and predicts where they should fail.

research-papers · interpretability
INTERPRETABILITY RESEARCH·2026-05-20

Complete Replacement Models combine transcoders + Lorsas to fully sparsify language models

A new class of interpretability methods — Complete Replacement Models (CRMs) — combines transcoder MLP replacements with localized SAE variants (Lorsas) to fully sparsify a transformer's representation. Where SAEs alone left residual dense pathways, CRMs aim to decompose the entire forward pass into named, sparse circuits.

interpretability · research
MIT TECH REVIEW·2026-05-20

MIT Technology Review names mechanistic interpretability a 2026 Breakthrough Technology

MIT Technology Review's annual 10 Breakthrough Technologies list for 2026 names mechanistic interpretability — the field of reverse-engineering neural networks to understand how they compute — as one of the year's most consequential research directions. The recognition follows Anthropic's circuit-tracing work on Claude 3.5 Haiku and Anthropic's stated goal of reliably detecting most AI model problems by 2027 using interpretability tools.

interpretability · research
ARXIV·2026-05-20

Recursive latent-space reasoning unlocks out-of-distribution generalization without chain-of-thought tokens

A new architectural approach for transformers performs reasoning recursively in latent space rather than externalizing it as chain-of-thought tokens. The method achieves robust algorithmic generalization on out-of-distribution tasks where standard transformers fail — and provides mechanistic interpretability analysis to characterize where the reasoning happens internally.

research-papers · architecture
CLAUDE5 HUB / ALIGNMENT·2026-05-20

RLHF 2.0 methodology cuts alignment-tax performance penalty by 60% vs first-generation RLHF

Recent results show RLHF 2.0 — the iteration that combines preference modeling with constitutional self-play and process supervision — reduces the alignment-tax penalty by approximately 60% compared to first-generation methods. The structural implication: safety training no longer requires substantial capability concessions.

alignment · research
SOURCE·2026-05-18

Why Pass@k efficiency is the real 2026 story

The most-cited 2026 LLM papers aren't about new capabilities — they're about getting the same accuracy with fewer attempts. That changes the inference economics of agents more than any model release this year.

analysis · research
ANTHROPIC / ARXIV·2026-05-16

Constitutional Classifiers cut jailbreak success from 86% to 4.4%

An Anthropic paper formalizes Constitutional Classifiers — small purpose-trained models that screen LLM inputs and outputs against a constitution. The headline result: jailbreak success rate on standard red-team suites drops from 86% to 4.4% with negligible helpfulness cost.

research · alignment · safety
ZYLOS RESEARCH·2026-05-15

Zylos Research publishes 2026 mech interp landscape survey

Zylos Research released a comprehensive survey of mechanistic interpretability progress through Q2 2026. Headline finding: sparse autoencoders are now reliably extracting interpretable circuits at the scale of frontier models, but downstream uses in alignment remain mostly speculative.

research · interpretability
AI DAILY POST / ARXIV SURVEY·2026-05-14

Pass@k efficiency emerges as the dominant LLM research theme of 2026

A May 2026 survey of the most-cited 2026 LLM papers identifies a clear shift: instead of pushing peak Pass@1, the field is targeting Pass@k efficiency — solving problems with fewer parallel attempts. The downstream implication is cheaper inference at fixed capability.

research · research-papers
ANTHROPIC / CLAUDE5 HUB·2026-05-08

Constitutional self-play matures — 40% fewer harmful outputs than pure RLHF

The 2026 evolution of Constitutional AI introduces "constitutional self-play": the model generates its own training examples by critiquing and refining responses against the constitution. Reported result: CAI-trained models produce 40% fewer harmful outputs than pure RLHF baselines while preserving helpfulness.

alignment · research
SUBQUADRATIC / LLM-STATS·2026-05-05

SubQ 1M-Preview — first commercial subquadratic LLM, 12M token native context

Subquadratic's May 5 launch is the first generally-available large language model that drops standard transformer attention entirely. Claimed: ~5x lower cost than frontier transformers, up to 52x faster attention at scale, and a native 12 million token context window — not a sliding-window trick.

models · frontier-models · research
AMILABS.XYZ·2026-03-09

Yann LeCun's AMI Labs raises $1.03B seed to build "world models"

Paris-headquartered Advanced Machine Intelligence (AMI Labs) closed one of the largest seed rounds on record at $3.5B pre-money. LeCun's contrarian thesis: LLMs are wrong-headed, world models are the path.

industry · research