HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning

Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge. Current hybrid RAG systems retrieve evidence from both knowledge graphs (KGs) and text documents to support LLM reasoning. However, they face challenges including multi-hop reasoning, multi-entity questions, multi-source verification, and effective graph utilisation. HydraRAG is a training-free framework that unifies graph topology, document semantics, and source reliability to support deep, faithful reasoning. An agent explores structured and unstructured sources, increasing both diversity and precision of evidence, and applies tri-factor verification (trustworthiness, corroboration, entity-path alignment) to reduce noise and improve faithfulness.

KG + text Agent exploration Tri-factor verification Training-free

HydraRAG overview: agent explores KG + text and applies tri-factor verification.

Figure 1: Four LLM reasoning paradigms and HydraRAG's cross-source workflow.

Method

Overview. HydraRAG constructs an evidence graph while answering a question. It queries both KG and text, expands along promising relations, applies reliability checks, and keeps verified paths for the final answer.

1) Initialisation

Decompose the question; estimate hop depth; select sources (KG / Wikipedia / web); detect topic entities and form a small query subgraph.

2) Evidence exploration

Initial: agentic selector-chosen sources for high-precision seeds; Refined: add LLM priors + live web to patch gaps; Predicted: test LLM-proposed candidates along KG/Wiki paths.

3) Evidence pruning

Score relevance and agreement, merge near-duplicates, and keep a compact evidence graph with top-ranked paths.

4) Question answering

Use the selected paths as citations for answer generation. Keep slow thinking and deep reasoning.

HydraRAG pipeline: initialisation → exploration → pruning → answering

Figure 2: System pipeline with the four stages used in our implementation.

Figure 3: Overview of the initialization section.

Benchmarks and summary

HydraRAG reports strong results across seven benchmarks, including CWQ, AdvHotpotQA, QALD10-en, SimpleQA, WebQSP, WebQuestions, and Zero-shot RE. With GPT-3.5 it improves over a strong hybrid baseline and allows smaller models such as Llama-3.1-8B to approach GPT-4-Turbo on these tasks.

Read the paper Code (GitHub)

Install and run

HydraRAG relies on Freebase and Wikidata services for the KG side. Please follow the setup guides in the repository and put your API keys where indicated.

# 1) Clone and create environment
git clone https://github.com/SteveTANTAN/HydraRAG.git
cd HydraRAG
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 2) Prepare KG services
# See Freebase/README.md and Wikidata/README.md in the repo

# 3) Run (example)
python hydra_main.py webqsp --depth 3 --allr --model llama70b

# Modules stay active unless explicitly disabled with --no-*
# e.g., ablation without web evidence:
python hydra_main.py webqsp --no-web --depth 3

BibTeX

@misc{tan2025hydraRAG,
    title={HydraRAG: Structured Cross-Source Enhanced Large Language Model Reasoning}, 
    author={Xingyu Tan and Xiaoyang Wang and Qing Liu and Xiwei Xu and Xin Yuan and Liming Zhu and Wenjie Zhang},
    year={2025},
    eprint={2505.17464},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2505.17464}, 
}