Contextual Agentic Memory is a Memo, Not True Memory
ByPulseAugur Editorial·
Summary by None
from 19 sources
Researchers are exploring advanced memory systems for LLM agents to improve their reasoning and learning capabilities. One approach, E-mem, uses a hierarchical architecture with multiple agents to reconstruct episodic contexts without losing crucial information. Another method, ViLoMem, focuses on a dual-stream memory framework to separately encode visual and logical information, enabling agents to learn from both successes and failures. Additionally, a paper argues that current agentic memory systems are merely lookups and not true memory, proposing a neuroscience-inspired approach for better generalization and security.
AI
IMPACT
These research papers explore methods to enhance LLMagent reasoning, learning, and memory, potentially leading to more robust and capable AI systems.
RANK_REASON
Multiple arXiv papers present novel research on improving LLM agent capabilities through advanced memory systems and learning techniques.
On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level mechanisms underlying OPD's efficiency remain poo…
On-policy distillation offers dense, per-token supervision for training reasoning models; however, it remains unclear under which conditions this signal is beneficial and under which it is detrimental. Which teacher model should be used, and in the case of self-distillation, whic…
On-policy self-distillation (self-OPD) densifies reinforcement learning with verifiable rewards (RLVR) by letting a policy teach itself under privileged context. We find that when this guidance spans the full response, all-token KL spends gradients on mostly redundant positions a…
On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differentl…
On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differentl…
arXiv:2605.06387v1 Announce Type: new Abstract: On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage …
On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suffers from three stru…
arXiv cs.LG
TIER_1·Anastasis Kratsios, A. Martina Neuman, Philipp Petersen·
arXiv:2605.04995v1 Announce Type: new Abstract: We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrar…
arXiv:2605.03677v1 Announce Type: new Abstract: On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions unde…
On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions under which OPD yields reliable improvement remain p…
arXiv cs.AI
TIER_1·Kaixiang Wang, Yidan Lin, Jiong Lou, Zhaojiacheng Zhou, Bunyod Suvonov, Jie Li·
arXiv:2601.21714v2 Announce Type: replace Abstract: The evolution of Large Language Model (LLM) agents towards System~2 reasoning, characterized by deliberative, high-precision problem-solving, requires maintaining rigorous logical integrity over extended horizons. However, preva…
arXiv:2511.21678v2 Announce Type: replace-cross Abstract: MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for …
arXiv cs.LG
TIER_1·Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, Estevam Hruschka·
arXiv:2510.19897v2 Announce Type: replace-cross Abstract: We investigate how agents built on pretrained large language models (LLMs) can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often…
arXiv:2601.10702v2 Announce Type: replace-cross Abstract: Deploying large language models in long-horizon, goal-oriented interactions remains challenging because similar entities and facts recur under different latent goals and constraints, causing memory systems to retrieve cont…
arXiv:2604.27707v1 Announce Type: new Abstract: Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error w…
Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error with provable consequences for agent capability, …
arXiv cs.CV
TIER_1·Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng, Ruoyi Du, Xiangpeng Yang, Qilong Wu, Zhen Li, Peng Gao, Harry Yang, Steven Hoi·
arXiv:2605.05204v1 Announce Type: new Abstract: The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present signifi…
The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervis…
We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we r…