Trace Only What You Need: Structure-Aware On-Demand Hypergraph Memory for Long-Document Question Answering

Abstract

How can we reason over long documents efficiently — tracing only the structure that matters, exactly when it matters?

Long-document question answering (QA) requires large language models (LLMs) to reason over evidence scattered across lengthy documents, where answers often depend on event order, section-level context, and cross-part evidence connections. Although retrieval-augmented generation (RAG) reduces the input context by retrieving relevant evidence, existing structured RAG methods still face three limitations: costly query-agnostic knowledge organization, insufficient use of original document structure, and no reuse of historical reasoning experience.

To address these limitations, we propose DocTrace, a multi-agent RAG framework for long-document QA that supports query-triggered knowledge organization, document-structure-aware and experience-guided reasoning. DocTrace preserves document hierarchy with a lightweight document structural tree index, constructs agent-shared hypergraph-structured working memory on demand during reasoning, and stores successful reasoning plans in graph-structured experience memory for future reuse, enabling adaptive exploration across related long-document questions.

Figure 1: Overview of the DocTrace framework with on-demand hypergraph memory and experience-guided reasoning.

Addressed Limitations & Solutions

📄 Query-Agnostic Overhead

Limitation: Existing methods index the entire document before any query, introducing heavy upfront token and latency costs regardless of the question.

DocTrace Solution: On-demand hypergraph construction — knowledge is organized only when and where reasoning needs it, triggered by the query itself.

🏗️ Lost Document Structure

Limitation: Flat chunking discards the structural position of passages — event order, section context, and narrative flow are lost.

DocTrace Solution: A lightweight document structural tree index preserves document hierarchy, enabling section-level and cross-part evidence connections during reasoning.

🧠 No Experience Reuse

Limitation: Each query is processed from scratch — successful reasoning plans are never stored or reused for similar future questions.

DocTrace Solution: A graph-structured experience memory stores successful reasoning plans, enabling adaptive exploration across related long-document questions.

Key Results

+8.85%

Improvement in F1 over the strongest baseline (ComoRAG) on long-document QA benchmarks.

+4.40%

Improvement in Exact Match (EM) over the strongest baseline, demonstrating more precise answer extraction.

−53.32%

Reduction in overall computational cost through on-demand structure construction.

Evaluated on four long-document QA datasets. DocTrace achieves best performance on three out of four datasets.

BibTeX

@article{zai2026doctrace,
  title={Trace Only What You Need: Structure-Aware On-Demand Hypergraph Memory for Long-Document Question Answering},
  author={Zai, Xiangjun and Tan, Xingyu and Chen, Chen and Wang, Xiaoyang and Zhang, Wenjie},
  year={2026}
}