Research papers

Research note on rethinking the cursor as an input modality when the system on the other side of the screen isn't a passive document but an active agent.

research→

DEEPCOGITO.COM·

Deep Cogito v2: open-source models that internalize their own reasoning

San Francisco startup founded by ex-Googlers ships four open-source hybrid reasoning models — 70B, 109B, 405B, 671B — using a technique called Iterated Distillation and Amplification (IDA) to distill search-time reasoning back into model weights.

open-source · research→

ARXIV 2504 / VLM RESEARCH·2026-05-22

Chain-of-Modality prompting — Vision-Language Models progressively integrate modalities to refine manipulation plans from human demonstration video

An arXiv paper titled 'Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models' (arXiv 2504.13351) introduces a prompting strategy where Vision Language Models progressively integrate information from each modality to refine task plans for robotic manipulation. The structural innovation is that the methodology works without retraining — it's a prompting protocol that elicits multimodal reasoning from existing VLMs.

research-papers · multimodal→

ANTHROPIC / OPENAI / INDUSTRY·2026-05-22

DPO has supplanted RLHF as the default frontier alignment method — the 2026 safety-research stack moves from preference modeling to direct optimization

Industry consensus by May 2026 places Direct Preference Optimization (DPO) as the default alignment training method across frontier labs, replacing the more complex RLHF pipeline that dominated through 2025. The shift is structural: DPO requires less compute, fewer human-in-the-loop annotations, and produces more interpretable preference gradients. Combined with the rise of process-reward models and constitutional self-critique loops, frontier alignment has materially simplified.

alignment · research→

MEDRXIV / BIASMEDQA·2026-05-22

LLM reasoning does not protect against clinical cognitive biases — BiasMedQA shows reasoning chains carry the same anchoring failures as direct answers

A medical-AI evaluation paper using the BiasMedQA benchmark finds that LLM reasoning chains do not protect models from clinical cognitive biases (anchoring, availability, confirmation). Reasoning-tier models fall into the same diagnostic-bias patterns as direct-answer models — sometimes more confidently, because the reasoning chain provides surface-level justification for the biased outcome.

research-papers · safety→

ARXIV / MECHANISTIC INTERPRETABILITY REVIEW·2026-05-22

Mechanistic Interpretability for AI Safety — the field-defining review consolidates 2024-2026 methodology into a single reference text

An updated 'Mechanistic Interpretability for AI Safety — A Review' (arXiv 2404.14082) consolidates the 2024-2026 methodology pipeline — circuit identification, feature differentials, sparse autoencoder methods, and behavioral attribution — into the field's reference text. The review's publication this week, during the postponed-EO ambiguity, gives both AISI and lab-internal teams a single citation surface for methodology discussions.

interpretability · research→

ARXIV 2510 / MIT CSAIL·2026-05-22

MultiModal Action Conditioned Video Generation — MIT CSAIL paper opens fine-grained multimodal control beyond text-to-video

An MIT CSAIL paper by Yichen Li and Antonio Torralba (arXiv 2510.02287) introduces a multimodal action-conditioned video generation approach that captures proprioception, kinesthesia, force haptics, and muscle activation as control signals. The architecture lets users condition video generation on fine-grained physical interaction signals rather than just text prompts — a meaningful step beyond the Sora/Veo/Kling text-to-video pattern.

multimodal · research-papers→

ARXIV 2605·2026-05-22

Lifting Traces to Logic — programmatic skill induction with neuro-symbolic learning targets long-horizon agentic tasks

A new arXiv paper titled 'Lifting Traces to Logic: Programmatic Skill Induction with Neuro-Symbolic Learning for Long-Horizon Agentic Tasks' proposes a methodology for extracting reusable program-like skills from neural reasoning traces and re-using them across agentic workflows. The result is a step toward closing the gap between transformer-style reasoning (broad but expensive) and symbolic planning (narrow but cheap).

research-papers · architecture→

ARXIV 2511 / ROBOT PLANNING·2026-05-22

Long-context Q-Former integrated with Multimodal LLM — robot confirmation and action planning gets a context-spanning attention pattern

An arXiv paper titled 'Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM' (arXiv 2511.17335) proposes a long-context Q-former architecture incorporating left-right context dependency in full videos, plus a text-conditioning approach that feeds text embeddings directly into the LLM decoder. The combination produces more reliable confirmation generation and action planning for long-horizon manipulation tasks.

research-papers · robotics→

SOURCE·2026-05-22

The skill-library architecture — neuro-symbolic skill induction may be the 2027 reasoning-model design pattern

A new arXiv paper lifts neural reasoning traces into reusable logical skill predicates. Combined with this month's sparse-policy-selection finding, the picture clarifies: 2027 reasoning models likely look less like 'bigger transformer' and more like 'transformer plus skill library plus retrieval.'

analysis · research-papers→

SOURCE·2026-05-22

The VLM-robotics stack emerges — Chain-of-Modality, long-context Q-former, and action-conditioned video sketch the 2027 architecture

Three papers, one trajectory. Chain-of-Modality elicits multimodal reasoning from existing VLMs without retraining. Long-context Q-Former retains temporal coherence across long-horizon tasks. Action-conditioned video extends conditioning to physical control signals. The 2026 H1 research trajectory points at a coherent 2027 robotics-AI architecture.

analysis · research-papers→

DEEP COGITO·2026-05-21

Deep Cogito v2 ships 70B/109B/405B/671B open-weight family with Iterated Distillation & Amplification self-improvement loop

Deep Cogito's v2 release ships four open-weight sizes (70B, 109B, 405B, 671B) wired into an Iterated Distillation & Amplification (IDA) self-improvement loop. The release positions IDA as a deployable architecture rather than a research curiosity — the first open-weight family where the &quot;model improves itself between checkpoints&quot; methodology is shipped as the default training recipe.

open-source · frontier-models · research→

ALIGNMENT RESEARCH·2026-05-21

Direct Preference Optimization quietly replaces RLHF at the frontier — simpler pipeline, equivalent capability, cheaper to iterate

Direct Preference Optimization (DPO) has now displaced RLHF at the frontier across multiple labs. The shift is methodological rather than headline-grabbing: DPO removes the separate reward-model training stage, treats the preference data directly as the optimization signal, and produces comparable alignment outcomes with roughly half the engineering complexity.

alignment · research→

ARXIV / INTERPRETABILITY·2026-05-21

New arXiv work on decoding encrypted chain-of-thought reasoning — latent-reasoning models pose new monitorability challenge

Recent arXiv work (Dec 2025–May 2026) introduces a model organism for opaque internal reasoning and proposes unsupervised decoding of encrypted chain-of-thought. The research direction responds to a frontier-safety problem: as more frontier labs explore latent-reasoning models that don't externalize CoT in human language, the standard CoT-monitorability assumption breaks.

interpretability · alignment · research→

OPENAI / SAWIN·2026-05-21

The Erdős unit-distance proof becomes a methodology case study — Princeton's Sawin refinement opens the door for auditing AI math

OpenAI's Erdős unit-distance result, paired with Princeton's Will Sawin refinement showing δ ≥ 0.014, has become a methodology test-case for how AI-generated mathematics gets audited and refined by human mathematicians. The collaboration model — AI produces the construction and proof, human researcher tightens the bound — is the first concrete demonstration of the human-plus-AI mathematics workflow at research-frontier scale.

research-papers · math→

ARXIV / ROBOTICS RESEARCH·2026-05-21

Interleaved vision-language reasoning traces unlock long-horizon robot manipulation in unseen environments

A new arXiv paper, &quot;Thinking in Text and Images: Interleaved Vision-Language Reasoning Traces for Long-Horizon Robot Manipulation,&quot; shows that interleaving language and image tokens in the reasoning trace produces materially better generalization on long-horizon manipulation tasks in unseen environments. The technique scales to the kind of task class that home-robot deployment requires.

research-papers · robotics→

MIT TECHNOLOGY REVIEW·2026-05-21

Mechanistic interpretability named one of MIT Tech Review's 10 Breakthrough Technologies of 2026

Mechanistic interpretability — the program of reverse-engineering neural-network computations into human-understandable algorithms — has been named one of MIT Technology Review's 10 Breakthrough Technologies of 2026. The recognition formalizes what frontier labs have been signaling for two years: interpretability is no longer a research-niche but a structural safety pillar.

interpretability · research→

OPENAI·2026-05-21

OpenAI's general-purpose reasoning model autonomously disproves an 80-year-old Erdős conjecture in discrete geometry

OpenAI announced that one of its general-purpose reasoning models autonomously disproved a central conjecture in discrete geometry — the planar unit-distance problem posed by Paul Erdős in 1946. The model found a new family of point configurations beating the square-grid arrangement and produced a mathematical proof. A subsequent refinement by Princeton's Will Sawin showed δ ≥ 0.014 is achievable from the construction.

frontier-models · research · math→

AI DAILY POST / PAPER TREND ANALYSIS·2026-05-21

Top 2026 LLM papers continue Pass@k efficiency theme — solving problems with fewer attempts is the year's dominant research direction

A trend analysis of the top-cited 2026 LLM papers confirms Pass@k efficiency as the year's dominant research direction. Where 2024–2025 emphasized capability ceilings (can the model solve the problem at all?), 2026 papers are converging on efficiency frontiers (can the model solve it on the first or second attempt?). The shift reflects inference-cost reality across the deployed frontier.

research-papers · architecture→

ARXIV 2605.06241·2026-05-21

New arXiv work argues RL for LLM reasoning is sparse policy selection, not capability learning — only 1-3% of tokens shift

An arXiv paper out this month — 'Rethinking RL for LLM Reasoning: It's Sparse Policy Selection, Not Capability Learning' — finds that RL fine-tuning of frontier reasoning models affects only 1-3% of token positions, and that the promoted tokens nearly always lie within the base model's top-5 alternatives. The result reframes 'reasoning models' as base models with sparsely-modified token-selection policies, not as models with new reasoning capability.

alignment · research→

ARXIV 2605.02073·2026-05-21

Search-driven reward-function optimization paper shows GRPO can be improved by treating the reward spec itself as the optimization target

A May arXiv paper, 'Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning,' shows that treating the reward function as an optimization object — generating candidate rewards with a frontier LLM, validating them automatically, and screening through GRPO training runs — produces materially better reasoning gains than fixed-reward training. The pipeline is roughly 30% more sample-efficient than baseline GRPO.

research-papers · architecture→

SOURCE·2026-05-21

The Erdős precedent — what changes when AI-authored mathematics enters peer review

OpenAI proves a 1946 Erdős conjecture. Will Sawin refines the bound. The collaboration shape — AI produces, human reviewer refines — is the first concrete answer to 'who audits AI-generated mathematics.' That answer matters more than the specific theorem.

analysis · research-papers→

SOURCE·2026-05-21

The Pass@k pivot becomes canonical — 2026 research has rotated to efficiency, not ceilings

The 2026 paper trend analysis confirms what production teams knew six months ago: capability ceilings are stable, the frontier of useful research is now first-attempt accuracy.

analysis · research-papers→

ARXIV·2026-05-20

&quot;Attention as Binding&quot; paper formalizes transformer reasoning as approximate Vector Symbolic Architecture

A new arXiv paper, &quot;Attention as Binding: A Vector-Symbolic Perspective on Transformer Reasoning,&quot; interprets self-attention and residual streams as implementing an approximate Vector Symbolic Architecture (VSA). The framing provides a unified theoretical account for why transformers can do compositional reasoning — and predicts where they should fail.

research-papers · interpretability→

INTERPRETABILITY RESEARCH·2026-05-20

Complete Replacement Models combine transcoders + Lorsas to fully sparsify language models

A new class of interpretability methods — Complete Replacement Models (CRMs) — combines transcoder MLP replacements with localized SAE variants (Lorsas) to fully sparsify a transformer's representation. Where SAEs alone left residual dense pathways, CRMs aim to decompose the entire forward pass into named, sparse circuits.

interpretability · research→

MIT TECH REVIEW·2026-05-20

MIT Technology Review names mechanistic interpretability a 2026 Breakthrough Technology

MIT Technology Review's annual 10 Breakthrough Technologies list for 2026 names mechanistic interpretability — the field of reverse-engineering neural networks to understand how they compute — as one of the year's most consequential research directions. The recognition follows Anthropic's circuit-tracing work on Claude 3.5 Haiku and Anthropic's stated goal of reliably detecting most AI model problems by 2027 using interpretability tools.

interpretability · research→

ARXIV·2026-05-20

Recursive latent-space reasoning unlocks out-of-distribution generalization without chain-of-thought tokens

A new architectural approach for transformers performs reasoning recursively in latent space rather than externalizing it as chain-of-thought tokens. The method achieves robust algorithmic generalization on out-of-distribution tasks where standard transformers fail — and provides mechanistic interpretability analysis to characterize where the reasoning happens internally.

research-papers · architecture→

CLAUDE5 HUB / ALIGNMENT·2026-05-20

RLHF 2.0 methodology cuts alignment-tax performance penalty by 60% vs first-generation RLHF

Recent results show RLHF 2.0 — the iteration that combines preference modeling with constitutional self-play and process supervision — reduces the alignment-tax penalty by approximately 60% compared to first-generation methods. The structural implication: safety training no longer requires substantial capability concessions.

alignment · research→

ARXIV / INTERPRETABILITY·2026-05-20

Sparse feature circuit-finding scales to 30× larger models — in-context learning circuits now traceable

Recent work scaled sparse feature circuit-finding methodology to models with 30 times more parameters than prior demonstrations. The scaled method successfully identifies the circuits that drive in-context learning — one of the previously opaque emergent behaviors of large transformers.

interpretability · research→

ARXIV·2026-05-20

State Stream Transformer surfaces emergent metacognitive behaviors via latent state persistence

A January 2026 arxiv paper introduces the State Stream Transformer (SST) architecture — a transformer variant that persists latent state across inference calls. The paper claims emergent metacognitive-like higher-order processing: the model can reason about its own previous reasoning in a way standard transformers cannot.

research-papers · reasoning→

ARXIV·2026-05-20

"Transformers are Bayesian Networks" — every sigmoid transformer implements weighted loopy belief propagation

A March 2026 arxiv paper proves that every sigmoid transformer architecture, with any weights, implements weighted loopy belief propagation on its implicit factor graph. The paper provides a precise answer to the long-standing question of why transformers work — they are doing approximate Bayesian inference, by construction.

research-papers · theory→

ARXIV 2510.06261·2026-05-19

AlphaApollo: deep agentic reasoning system decomposes complex tasks via foundation-model interleaving

AlphaApollo, described in a new arXiv preprint, presents a deep agentic reasoning architecture in which foundation models interleave explicit reasoning steps, tool queries, and tool outputs in a single unified loop. Initial benchmarks suggest substantial gains on long-horizon scientific reasoning tasks.

research · agents · reasoning→

ARXIV 2605.13930·2026-05-19

TopK Sparse Autoencoders extract interpretable clinical features from EEG foundation models

An arXiv preprint (2605.13930, submitted May 13) applies TopK Sparse Autoencoders to three EEG foundation models — SleepFM, REVE, LaBraM — and successfully extracts sparse feature dictionaries that align with clinical taxonomies including abnormality, age, sex, and medication state.

research · interpretability→

ARXIV 2509.03738 / ICLR 2026·2026-05-19

SAE Neural Operators paper accepted to ICLR 2026 — generalizing SAEs across model scales

Mechanistic Interpretability with Sparse Autoencoder Neural Operators (arXiv 2509.03738), accepted at ICLR 2026, generalizes the SAE methodology to operate as a neural operator that transfers learned dictionaries across models of different scales without retraining.

interpretability · research→

ARXIV 2512.05534·2026-05-19

Unified Theory of Sparse Dictionary Learning paper formalizes spurious minima in mech interp

An arXiv preprint (2512.05534, last updated May 2) proposes a unified theoretical framework for sparse dictionary learning in mechanistic interpretability, characterizing the piecewise biconvex optimization landscape and proving the existence and characterization of spurious local minima.

interpretability · research · theory→

ARXIV·2026-05-18

'Alignment Waste' paper formalizes why safety doesn't transfer between architectures

A new arXiv preprint formalizes a phenomenon researchers had observed informally: alignment artifacts (RLHF policies, constitutional rules, refusal heuristics) are neither transferable to new model architectures nor correctable without expensive retraining.

research · alignment · theory→

OPENAI / DEEPMIND / ANTHROPIC·2026-05-18

Multi-dimensional human feedback is supplanting thumbs-up/down across major labs

OpenAI, DeepMind, and Anthropic have all published versions of multi-dimensional RLHF in 2026 — where annotators score helpfulness, harmlessness, honesty, and task-specific quality separately rather than as a single preference signal.

research · alignment→

SOURCE·2026-05-18

Why Pass@k efficiency is the real 2026 story

The most-cited 2026 LLM papers aren't about new capabilities — they're about getting the same accuracy with fewer attempts. That changes the inference economics of agents more than any model release this year.

analysis · research→

ANTHROPIC / ARXIV·2026-05-16

Constitutional Classifiers cut jailbreak success from 86% to 4.4%

An Anthropic paper formalizes Constitutional Classifiers — small purpose-trained models that screen LLM inputs and outputs against a constitution. The headline result: jailbreak success rate on standard red-team suites drops from 86% to 4.4% with negligible helpfulness cost.

research · alignment · safety→

ZYLOS RESEARCH·2026-05-15

Zylos Research publishes 2026 mech interp landscape survey

Zylos Research released a comprehensive survey of mechanistic interpretability progress through Q2 2026. Headline finding: sparse autoencoders are now reliably extracting interpretable circuits at the scale of frontier models, but downstream uses in alignment remain mostly speculative.

research · interpretability→

AI DAILY POST / ARXIV SURVEY·2026-05-14

Pass@k efficiency emerges as the dominant LLM research theme of 2026

A May 2026 survey of the most-cited 2026 LLM papers identifies a clear shift: instead of pushing peak Pass@1, the field is targeting Pass@k efficiency — solving problems with fewer parallel attempts. The downstream implication is cheaper inference at fixed capability.

research · research-papers→

ARXIV 2512.14474·2026-05-12

Model-First Reasoning — explicit problem modeling cuts hallucinations in LLM agents

A May 2026 arXiv preprint introduces Model-First Reasoning (MFR): a paradigm where an LLM agent is required to construct an explicit problem model before proposing a solution. The reported effect is a sharp drop in hallucinated steps and a more inspectable trace.

research · research-papers · agents→

THINKINGMACHINES.AI·2026-05-11

Mira Murati's Thinking Machines unveils "interaction models" — 0.4-second full-duplex AI

Former OpenAI CTO's startup announces TML-Interaction-Small: a model designed to handle voice, video, and text simultaneously, respond in 0.40 seconds, and interrupt mid-sentence rather than waiting for turns.

models · UX→

ANTHROPIC / CLAUDE5 HUB·2026-05-08

Constitutional self-play matures — 40% fewer harmful outputs than pure RLHF

The 2026 evolution of Constitutional AI introduces "constitutional self-play": the model generates its own training examples by critiquing and refining responses against the constitution. Reported result: CAI-trained models produce 40% fewer harmful outputs than pure RLHF baselines while preserving helpfulness.

alignment · research→

CLAUDE5 HUB / OPENAI / DEEPMIND·2026-05-06

Multi-dimensional RLHF: feedback along helpfulness, harmlessness, honesty, task-specific axes

OpenAI, DeepMind, and others have moved past single-dimension preference learning. The 2026 standard is multi-dimensional feedback: human raters score outputs separately on helpfulness, harmlessness, honesty, and task-specific axes, and reward models combine these into a richer signal.

alignment · research→

SUBQUADRATIC / LLM-STATS·2026-05-05

SubQ 1M-Preview — first commercial subquadratic LLM, 12M token native context

Subquadratic's May 5 launch is the first generally-available large language model that drops standard transformer attention entirely. Claimed: ~5x lower cost than frontier transformers, up to 52x faster attention at scale, and a native 12 million token context window — not a sliding-window trick.

models · frontier-models · research→

AMILABS.XYZ·2026-03-09

Yann LeCun's AMI Labs raises $1.03B seed to build "world models"

Paris-headquartered Advanced Machine Intelligence (AMI Labs) closed one of the largest seed rounds on record at $3.5B pre-money. LeCun's contrarian thesis: LLMs are wrong-headed, world models are the path.

industry · research→

MIT TECHNOLOGY REVIEW·2026-01-12

MIT Tech Review names mechanistic interpretability a 2026 Breakthrough Technology

The annual "10 Breakthrough Technologies" list put mechanistic interpretability on the field's official map this year. The framing matters because it shifts mech interp from a research curiosity to a fundable infrastructure problem.

interpretability · research→

All items 198 items ← back to archive