A curated taxonomy of agent memory systems, organized along four axes: (1) memory system architectures — from flat sequential context, to structural topological graphs/trees, to multi-paradigm hybrid containers; (2) reference baselines for comparison; (3) benchmarks for evaluation; and (4) surveys on agent memory.
Systems that model memory as flat, one-dimensional sequences lacking explicit structural abstractions.
-
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation Junru Lu, Siyu An, Mingbao Lin, et al. arXiv 2023. [Paper]
-
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory Prateek Chhikara, Dev Khant, Saket Aryan, et al. arXiv 2025. [Paper] [Github]
-
MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents Zijian Zhou, Ao Qu, Zhaoxuan Wu, et al. arXiv 2025. [Paper]
-
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent Hongli Yu, Tinghong Chen, Jiangtao Feng, et al. arXiv 2025. [Paper]
Systems that abstract memory into structured graph and tree topologies with interconnected nodes and edges.
-
MemTree: From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs Alireza Rezazadeh, Zichao Li, Wei Wei, Yujia Bao. ICLR 2025. [Paper]
-
Zep: A Temporal Knowledge Graph Architecture for Agent Memory Preston Rasmussen, Pavel Paliychuk, Travis Beauvais, Jesse Ryan. arXiv 2025. [Paper] [Github]
-
Mem0 (Graph Mode, Mem0^g): Graph-variant of Mem0 that formalizes memory as a directed labeled graph with entity-relation triplets, using a heterogeneous multi-engine (vector + graph DB) backend. arXiv 2025. [Paper] [Github]
-
Cognee: Optimizing the Interface Between Knowledge Graphs and LLMs for Complex Reasoning Vasilije Markovic, Lazar Obradovic, Laszlo Hajdu, Jovan Pavlovic. arXiv 2025. [Paper] [Github]
Systems that package memory into complex, multi-part data containers combining unstructured text with structured metadata; some also route memory across heterogeneous backends.
-
LightMem: Lightweight and Efficient Memory-Augmented Generation Jizhan Fang, Xinle Deng, Haoming Xu, et al. arXiv 2025. [Paper]
-
SimpleMem: Efficient Lifelong Memory for LLM Agents Jiaqi Liu, Yaofeng Su, Peng Xia, et al. arXiv 2026. [Paper]
-
MemOS: A Memory OS for AI System Zhiyu Li, Shichao Song, Chenyang Xi, et al. arXiv 2025. [Paper] [Github]
-
MemoryOS: Memory OS of AI Agent Jiazheng Kang, Mingming Ji, Zhe Zhao, Ting Bai. EMNLP 2025. [Paper] [Github]
-
A-MEM: Agentic Memory for LLM Agents Wujiang Xu, et al. arXiv 2025. [Paper] [Github]
-
Letta (MemGPT): Towards LLMs as Operating Systems Charles Packer, Vivian Fang, Shishir G. Patil, et al. arXiv 2023. [Paper] [Github]
- Long Context: A naive baseline that passes the full conversation history directly into the LLM context window without any external memory system. While effective on some tasks, it results in unacceptable latencies and token costs for production deployments.
-
Embedding RAG: A standard dense retrieval baseline that embeds past interactions into vectors and performs top-k similarity search, without any memory-specific extraction, maintenance, or routing logic.
-
BM25: A sparse lexical retrieval baseline using Okapi BM25 scoring for full-text search, evaluated in the appendix experiments.
-
Contriever: An unsupervised dense retrieval model used as a retrieval baseline in the appendix experiments. ICLR 2023. [Paper]
-
GraphRAG: A graph-based retrieval-augmented generation approach that constructs a knowledge graph from documents and retrieves via graph traversal. arXiv 2024. [Paper]
-
HippoRAG: A retrieval-augmented generation method inspired by the hippocampal memory indexing theory, using knowledge graph triples and dense vector embeddings. NeurIPS 2024. [Paper]
The following benchmarks are used to evaluate agent memory systems, covering task effectiveness, retrieval fidelity, update robustness, long-horizon stability, and operational cost.
-
LoCoMo: Evaluating Very Long-Term Conversational Memory of LLM Agents Adyasha Maharana, Dong-Ho Lee, Sergey Turishcheva, et al. ACL 2024. [Paper]
- Long-conversation QA benchmark testing episodic, temporal, open-domain, and single-hop memory over multi-turn interactions (50 multi-modal chats; 9,209 tokens and 304 turns avg.)
-
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory Di Wu, Hongwei Wang, Wenhao Yu, et al. ICLR 2025. [Paper]
- Multi-session long-memory benchmark evaluating cross-session QA and temporal knowledge updates (500 QA pairs; up to 1.5M tokens)
-
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks Yushi Bai, Shangqing Tu, Jiajie Zhang, et al. ACL 2025 / arXiv 2024. [Paper]
- Extreme long-context benchmark with 503 multiple-choice questions spanning 8K to 2M-word contexts, used for short, medium, and long context-length stability analysis
-
LifelongAgentBench: Evaluating LLM Agents as Lifelong Learners Junhao Zheng, Xidi Cai, Qiuke Li, et al. arXiv 2025. [Paper]
- Evaluates sequential procedural skill transfer across structurally related database, operating system, and knowledge graph tasks (1,396 tasks sharing atomic skills)
-
MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents Haoran Tan, Zeyu Zhang, Chen Ma, et al. ACL 2025 (Findings). [Paper]
- Measures memory capabilities across different abstraction levels (factual vs. reflective) and noise conditions, with stress tests up to 100K sessions
-
MemoryAgentBench: Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions Yuanzhe Hu, Yu Wang, Julian McAuley. arXiv 2025. [Paper]
- Tests four core competencies: accurate retrieval, test-time learning, long-range understanding, and selective forgetting across 14 datasets with context lengths ranging from 103K to 1.44M tokens
-
A Survey on the Memory Mechanism of Large Language Model based Agents Zeyu Zhang, Xiaohe Bo, Chen Ma, et al. ACM Transactions on Information Systems 2025. [Paper]
-
Memory in the Age of AI Agents Yuyang Hu, Shichun Liu, Yanwei Yue, et al. arXiv 2025. [Paper]
-
Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers Pengfei Du. arXiv 2026. [Paper]
-
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework Yanchen Wu, Tenghui Lin, Yingli Zhou, et al. Proceedings of the VLDB Endowment 2026. [Paper]
-
Graph-based Agent Memory: Taxonomy, Techniques, and Applications Chang Yang, Chuang Zhou, Yilin Xiao, et al. arXiv 2026. [Paper]
-
Lifelong Learning of Large Language Model based Agents: A Roadmap Junhao Zheng, Chengming Shi, Xidi Cai, et al. IEEE Transactions on Pattern Analysis and Machine Intelligence 2025. [Paper]
@article{memoryasdata,
title={Are We Ready For An Agent-Native Memory System?},
author={Wei Zhou and Xuanhe Zhou and Shaokun Han and Hongming Xu and Guoliang Li and Zhiyu Li and Feiyu Xiong and Fan Wu},
year={2026},
journal={arXiv preprint arXiv:2606.24775},
url={https://arxiv.org/abs/2606.24775}
}