MemoTime Logo

MemoTime: Memory-Augmented Temporal Knowledge Graph Enhanced Large Language Model Reasoning

Accepted by ACM The Web Conference 2026 (WWW 2026)

Xingyu Tan1,2 · Xiaoyang Wang1 · Qing Liu2 · Xiwei Xu2 ·
Xin Yuan2 · Liming Zhu2 · Wenjie Zhang1

1University of New South Wales    2CSIRO's Data61

Abstract

How can LLMs stay temporally faithful in a complex, evolving world?

Large Language Models often fail to maintain temporal consistency when dealing with multi-hop questions and evolving event sequences. MemoTime addresses these challenges by decomposing complex temporal questions into a hierarchical Tree of Time. It enables operator-aware reasoning with dynamic evidence retrieval and a self-evolving experience memory, allowing the model to reuse prior reasoning for efficiency and stability.

MemoTime Framework

Figure 1: Overview of MemoTime's Memory-Augmented Reasoning Workflow.

Core Innovations

🌲 Tree of Time Hierarchical Reasoning

A hierarchical controller that decomposes complex temporal questions into a recursive reasoning tree. It enforces monotonic timestamp progression and synchronizes multiple entities under unified temporal bounds to ensure global reasoning coherence.

🧠 Self-evolving Experience Memory

A dynamic memory store that records successful reasoning traces and toolkit decisions. It forms a closed feedback loop, allowing the model to recall, reuse, and refine prior trajectories to progressively improve performance across inference cycles.

🔍 Hybrid Retrieval and Pruning

An operator-aware retrieval layer that unifies symbolic graph expansion with dense embedding search. It adaptively selects strategies for diverse temporal operators, ensuring precise evidence grounding while significantly reducing retrieval noise.

⚖️ Temporal Faithfulness

Ensures strict chronological validity by applying a temporal-first pruning policy. This mechanism filters out semantically relevant but temporally inconsistent paths, guaranteeing that all generated answers are factually grounded in the TKG structure.

Key Results

+24.0%

Improvement over strong baselines on MultiTQ.

4B ≈ GPT-4

Small models (e.g., Qwen3-4B) with MemoTime achieve GPT-4-Turbo level reasoning.

SOTA

Achieved state-of-the-art results on all tested TKGQA datasets.

Comprehensive experiments on MultiTQ and TimeQuestions confirm that MemoTime effectively compensates for the static nature of LLMs, enabling robust reasoning even with smaller backbones.

Quick Start

1. Installation

git clone https://github.com/SteveTANTAN/MemoTime
cd MemoTime
bash setup_dependencies.sh

2. Run Experiments (e.g., MultiTQ Dataset)

python run.py --dataset MultiTQ --questions 50 --hybrid --unified-knowledge

BibTeX

@article{tan2025memotime,
  title={Memotime: Memory-augmented temporal knowledge graph enhanced large language model reasoning},
  author={Tan, Xingyu and Wang, Xiaoyang and Liu, Qing and Xu, Xiwei and Yuan, Xin and Zhu, Liming and Zhang, Wenjie},
  journal={arXiv preprint arXiv:2510.13614},
  year={2025}
}