Contextual Agentic Memory is a Memo, Not True Memory

By PulseAugur Editorial · Summary by None from 19 sources

Researchers are exploring advanced memory systems for LLM agents to improve their reasoning and learning capabilities. One approach, E-mem, uses a hierarchical architecture with multiple agents to reconstruct episodic contexts without losing crucial information. Another method, ViLoMem, focuses on a dual-stream memory framework to separately encode visual and logical information, enabling agents to learn from both successes and failures. Additionally, a paper argues that current agentic memory systems are merely lookups and not true memory, proposing a neuroscience-inspired approach for better generalization and security. AI

Summary written by None from 19 sources. How we write summaries →

IMPACT These research papers explore methods to enhance LLM agent reasoning, learning, and memory, potentially leading to more robust and capable AI systems.

RANK_REASON Multiple arXiv papers present novel research on improving LLM agent capabilities through advanced memory systems and learning techniques.

Read on arXiv cs.CL →

paper
other

COVERAGE [19]

arXiv cs.CL TIER_1 · Junfeng Fang · 2026-05-12 08:19

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

On-policy distillation (OPD) has emerged as an efficient post-training paradigm for large language models. However, existing studies largely attribute this advantage to denser and more stable supervision, while the parameter-level mechanisms underlying OPD's efficiency remain poo…
arXiv cs.AI TIER_1 · Mehrdad Farajtabar · 2026-05-11 17:33

Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why

On-policy distillation offers dense, per-token supervision for training reasoning models; however, it remains unclear under which conditions this signal is beneficial and under which it is detrimental. Which teacher model should be used, and in the case of self-distillation, whic…
arXiv cs.AI TIER_1 · Lan-Zhe Guo · 2026-05-11 08:45

TRACE: Distilling Where It Matters via Token-Routed Self On-Policy Alignment

On-policy self-distillation (self-OPD) densifies reinforcement learning with verifiable rewards (RLVR) by letting a policy teach itself under privileged context. We find that when this guidance spans the full response, all-token KL spends gradients on mostly redundant positions a…
Hugging Face Daily Papers TIER_1 · 2026-05-08 13:16

SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation

On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differentl…
arXiv cs.CL TIER_1 · Xiang Wang · 2026-05-08 13:16

SimCT: Recovering Lost Supervision for Cross-Tokenizer On-Policy Distillation

On-policy distillation (OPD) is a standard tool for transferring teacher behavior to a smaller student, but it implicitly assumes that teacher and student predictions are comparable token by token, an assumption that fails whenever the two models tokenize the same text differentl…
arXiv cs.LG TIER_1 · Nan Jia, Haojin Yang, Xing Ma, Jiesong Lian, Shuailiang Zhang, Weipeng Zhang, Ke Zeng, Xunliang Cai, Zequn Sun · 2026-05-08 04:00

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

arXiv:2605.06387v1 Announce Type: new Abstract: On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage …
arXiv cs.AI TIER_1 · Zequn Sun · 2026-05-07 15:02

Asymmetric On-Policy Distillation: Bridging Exploitation and Imitation at the Token Level

On-policy distillation (OPD) trains a student on its own trajectories with token-level teacher feedback and often outperforms off-policy distillation and standard reinforcement learning. However, we find that its standard advantage weighted policy gradient suffers from three stru…
arXiv cs.LG TIER_1 · Anastasis Kratsios, A. Martina Neuman, Philipp Petersen · 2026-05-07 04:00

Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning

arXiv:2605.04995v1 Announce Type: new Abstract: We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrar…
arXiv cs.LG TIER_1 · Wenjin Hou, Shangpin Peng, Weinong Wang, Zheng Ruan, Yue Zhang, Zhenglin Zhou, Mingqi Gao, Yifei Chen, Kaiqi Wang, Hongming Yang, Chengquan Zhang, Zhuotao Tian, Han Hu, Yi Yang, Fei Wu, Hehe Fan · 2026-05-06 04:00

Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe

arXiv:2605.03677v1 Announce Type: new Abstract: On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions unde…
arXiv cs.LG TIER_1 · Hehe Fan · 2026-05-05 12:15

Uni-OPD: Unifying On-Policy Distillation with a Dual-Perspective Recipe

On-policy distillation (OPD) has recently emerged as an effective post-training paradigm for consolidating the capabilities of specialized expert models into a single student model. Despite its empirical success, the conditions under which OPD yields reliable improvement remain p…
arXiv cs.AI TIER_1 · Kaixiang Wang, Yidan Lin, Jiong Lou, Zhaojiacheng Zhou, Bunyod Suvonov, Jie Li · 2026-05-05 04:00

E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory

arXiv:2601.21714v2 Announce Type: replace Abstract: The evolution of Large Language Model (LLM) agents towards System~2 reasoning, characterized by deliberative, high-precision problem-solving, requires maintaining rigorous logical integrity over extended horizons. However, preva…
arXiv cs.LG TIER_1 · Weihao Bo, Shan Zhang, Yanpeng Sun, Jingjing Wu, Qunyi Xie, Xiao Tan, Kunbin Chen, Wei He, Xiaofan Li, Na Zhao, Jingdong Wang, Zechao Li · 2026-05-05 04:00

Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

arXiv:2511.21678v2 Announce Type: replace-cross Abstract: MLLMs exhibit strong reasoning on isolated queries, yet they operate de novo -- solving each problem independently and often repeating the same mistakes. Existing memory-augmented agents mainly store past trajectories for …
arXiv cs.LG TIER_1 · Jackson Hassell, Dan Zhang, Hannah Kim, Tom Mitchell, Estevam Hruschka · 2026-05-04 04:00

Learning from Supervision with Semantic and Episodic Memory: A Reflective Approach to Agent Adaptation

arXiv:2510.19897v2 Announce Type: replace-cross Abstract: We investigate how agents built on pretrained large language models (LLMs) can learn target classification functions from labeled examples without parameter updates. While conventional approaches like fine-tuning are often…
arXiv cs.AI TIER_1 · Ruozhen Yang, Yucheng Jiang, Yueqi Jiang, Priyanka Kargupta, Yunyi Zhang, Jiawei Han · 2026-05-01 04:00

Grounding Agent Memory in Contextual Intent

arXiv:2601.10702v2 Announce Type: replace-cross Abstract: Deploying large language models in long-horizon, goal-oriented interactions remains challenging because similar entities and facts recur under different latent goals and constraints, causing memory systems to retrieve cont…
arXiv cs.AI TIER_1 · Binyan Xu, Xilin Dai, Kehuan Zhang · 2026-05-01 04:00

Contextual Agentic Memory is a Memo, Not True Memory

arXiv:2604.27707v1 Announce Type: new Abstract: Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error w…
arXiv cs.CL TIER_1 · Kehuan Zhang · 2026-04-30 10:54

Contextual Agentic Memory is a Memo, Not True Memory

Current agentic memory systems (vector stores, retrieval-augmented generation, scratchpads, and context-window management) do not implement memory: they implement lookup. We argue that treating lookup as memory is a category error with provable consequences for agent capability, …
arXiv cs.CV TIER_1 · Dengyang Jiang, Xin Jin, Dongyang Liu, Zanyi Wang, Mingzhe Zheng, Ruoyi Du, Xiangpeng Yang, Qilong Wu, Zhen Li, Peng Gao, Harry Yang, Steven Hoi · 2026-05-07 04:00

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

arXiv:2605.05204v1 Announce Type: new Abstract: The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present signifi…
arXiv cs.CV TIER_1 · Steven Hoi · 2026-05-06 17:59

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

The landscape of high-performance image generation models is currently shifting from the inefficient multi-step ones to the efficient few-step counterparts (e.g, Z-Image-Turbo and FLUX.2-klein). However, these models present significant challenges for directly continuous supervis…
arXiv stat.ML TIER_1 · Philipp Petersen · 2026-05-06 14:53

Adaptivity Under Realizability Constraints: Comparing In-Context and Agentic Learning

We compare in-context learning with fixed queries and agentic learning with adaptive queries for uniform approximation of task families. We consider two settings: an unrestricted regime, where querying and approximation are arbitrary functions, and a realizable regime, where we r…

COVERAGE [19]

RELATED ENTITIES

RELATED TOPICS