Kimi just published a paper replacing residual connections in transformers. results look legit
Sentiment Mix
Geography
Expert Signals
Simple_Response8041
author • 1 mention
r/LocalLLaMA
source • 1 mention
AI-Generated Claims
Generated from linked receipts; click sources for full context.
Kimi (moonshot ai) dropped a paper on something called "attention residuals" that replaces the standard residual connection thats been in every transformer since resnet in 2015.
Supported by 1 story
layer 40 gets the accumulated output of layers 1-39 all piled up.
Supported by 1 story
kimi calls this the "dilution problem." Their fix is to let each layer selectively attend to outputs from all previous layers instead of just taking the sum.
Supported by 1 story
Results on their benchmarks: \- 3-7.5 point improvements on grad level exams, math reasoning, code gen, long context tasks \- saves \~1.25x compute with their block version \- training overhead under 4%, inference latency increase under 2% \- scales well, bigger models benefit more They also did a "block attention...
Supported by 1 story
Related Events
Experiment: How far can a 28M model go in business email generation?
Research • 3/20/2026
Anthropic Releases Findings from Global User Study on AI Expectations and Concerns - MLQ.ai
Research • 3/20/2026
Meta Ditches Metaverse Dreams, Shifts Focus Fully to AI - NAI500
Uncategorized • 3/20/2026
composer 2 is just Kimi K2.5 with RL?????
Uncategorized • 3/20/2026
Google Search is now using AI to replace headlines
Uncategorized • 3/20/2026