Abstract
How can we reason over long documents efficiently — tracing only the structure that matters, exactly when it matters?
Long-document question answering (QA) requires large language models (LLMs) to reason over evidence scattered across lengthy documents, where answers often depend on event order, section-level context, and cross-part evidence connections. Although retrieval-augmented generation (RAG) reduces the input context by retrieving relevant evidence, existing structured RAG methods still face three limitations: costly query-agnostic knowledge organization, insufficient use of original document structure, and no reuse of historical reasoning experience.
To address these limitations, we propose DocTrace, a multi-agent RAG framework for long-document QA that supports query-triggered knowledge organization, document-structure-aware and experience-guided reasoning. DocTrace preserves document hierarchy with a lightweight document structural tree index, constructs agent-shared hypergraph-structured working memory on demand during reasoning, and stores successful reasoning plans in graph-structured experience memory for future reuse, enabling adaptive exploration across related long-document questions.
Figure 1: Overview of the DocTrace framework with on-demand hypergraph memory and experience-guided reasoning.
Addressed Limitations & Solutions
📄 Query-Agnostic Overhead
Limitation: Existing methods index the entire document before any query, introducing heavy upfront token and latency costs regardless of the question.
DocTrace Solution: On-demand hypergraph construction — knowledge is organized only when and where reasoning needs it, triggered by the query itself.
🏗️ Lost Document Structure
Limitation: Flat chunking discards the structural position of passages — event order, section context, and narrative flow are lost.
DocTrace Solution: A lightweight document structural tree index preserves document hierarchy, enabling section-level and cross-part evidence connections during reasoning.
🧠 No Experience Reuse
Limitation: Each query is processed from scratch — successful reasoning plans are never stored or reused for similar future questions.
DocTrace Solution: A graph-structured experience memory stores successful reasoning plans, enabling adaptive exploration across related long-document questions.
Key Results
Improvement in F1 over the strongest baseline (ComoRAG) on long-document QA benchmarks.
Improvement in Exact Match (EM) over the strongest baseline, demonstrating more precise answer extraction.
Reduction in overall computational cost through on-demand structure construction.
Evaluated on four long-document QA datasets. DocTrace achieves best performance on three out of four datasets.
BibTeX
@article{zai2026doctrace,
title={Trace Only What You Need: Structure-Aware On-Demand Hypergraph Memory for Long-Document Question Answering},
author={Zai, Xiangjun and Tan, Xingyu and Chen, Chen and Wang, Xiaoyang and Zhang, Wenjie},
year={2026}
}