PulseAugur
LIVE 10:36:19
research · [4 sources] ·
0
research

New architectures combat catastrophic forgetting in LLMs

Researchers have developed new architectural approaches to address catastrophic forgetting in large language models during continual pre-training and fine-tuning. One method, TFGN, introduces an overlay that allows for parameter-efficient updates without altering the core transformer, demonstrating significant retention of prior knowledge across diverse domains and model scales. Another approach, UAM, inspired by biological vision, uses a dual-stream architecture to separate semantic understanding from action control, preserving multimodal capabilities during VLA model training. These advancements aim to enable models to learn continuously without degrading performance on previously acquired knowledge. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT New architectural designs for LLMs and VLA models promise improved continual learning capabilities, reducing knowledge degradation during fine-tuning and pre-training.

RANK_REASON The cluster contains two research papers introducing novel architectures for continual learning in LLMs and VLA models, addressing the problem of catastrophic forgetting.

Read on arXiv cs.AI →

New architectures combat catastrophic forgetting in LLMs

COVERAGE [4]

  1. arXiv cs.AI TIER_1 · Anurup Ganguli ·

    TFGN: Task-Free, Replay-Free Continual Pre-Training Without Catastrophic Forgetting at LLM Scale

    Continually pre-training a large language model on heterogeneous text domains, without replay or task labels, has remained an unsolved architectural problem at LLM scale. Existing methods rely on replay buffers, task identifiers, regularization penalties that scale poorly, or sen…

  2. arXiv cs.CV TIER_1 · Jianyu Chen ·

    UAM: A Dual-Stream Perspective on Forgetting in VLA Training

    Vision--language--action (VLA) models are typically built by fine-tuning a pretrained vision--language model (VLM) on action data. However, we show that this standard recipe systematically erodes the VLM's multimodal competence, a side effect we call the embodiment tax. But do VL…

  3. Medium — fine-tuning tag TIER_1 · L.J. ·

    Stop Messing with the Loss Function: The Smarter Way to Fix Catastrophic Forgetting in LLMs

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/@zljdanceholic/stop-messing-with-the-loss-function-the-smarter-way-to-fix-catastrophic-forgetting-in-llms-423ea65eef25?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.com/m…

  4. Medium — fine-tuning tag TIER_1 · Shashi Jagtap ·

    Learning, Fast and Slow: What’s Next in LLM Fine-Tuning and Plastic Continual Learning with GEPA

    <div class="medium-feed-item"><p class="medium-feed-image"><a href="https://medium.com/superagentic-ai/learning-fast-and-slow-whats-next-in-llm-fine-tuning-and-plastic-continual-learning-with-gepa-6ae53907d95e?source=rss------fine_tuning-5"><img src="https://cdn-images-1.medium.c…