Abstract
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge. Current hybrid RAG systems retrieve evidence from both knowledge graphs (KGs) and text documents to support LLM reasoning. However, they face challenges including multi-hop reasoning, multi-entity questions, multi-source verification, and effective graph utilisation. Hydra is a training-free framework that unifies graph topology, document semantics, and source reliability to support deep, faithful reasoning. An agent explores structured and unstructured sources, increasing both diversity and precision of evidence, and applies tri-factor verification (trustworthiness, corroboration, entity-path alignment) to reduce noise and improve faithfulness.

Method
Overview. Hydra constructs an evidence graph while answering a question. It queries both KG and text, expands along promising relations, applies reliability checks, and keeps verified paths for the final answer.
1) Initialisation
Decompose the question; estimate hop depth; select sources (KG / Wikipedia / web); detect topic entities and form a small query subgraph.
2) Evidence exploration
Initial: agentic selector-chosen sources for high-precision seeds; Refined: add LLM priors + live web to patch gaps; Predicted: test LLM-proposed candidates along KG/Wiki paths.
3) Evidence pruning
Score relevance and agreement, merge near-duplicates, and keep a compact evidence graph with top-ranked paths.
4) Question answering
Use the selected paths as citations for answer generation. Keep slow thinking and deep reasoning.


Benchmarks and summary
Hydra reports strong results across seven benchmarks, including CWQ, AdvHotpotQA, QALD10-en, SimpleQA, WebQSP, WebQuestions, and Zero-shot RE. With GPT-3.5 it improves over a strong hybrid baseline and allows smaller models such as Llama-3.1-8B to approach GPT-4-Turbo on these tasks.
Install and run
Hydra relies on Freebase and Wikidata services for the KG side. Please follow the setup guides in the repository and put your API keys where indicated.
# 1) Clone and create environment
git clone https://github.com/SteveTANTAN/HydraRAG.git
cd HydraRAG
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 2) Prepare KG services
# See Freebase/README.md and Wikidata/README.md in the repo
# 3) Run (example)
python hydra_main.py webqsp --depth 3 --allr --model llama70b
# Modules stay active unless explicitly disabled with --no-*
# e.g., ablation without web evidence:
python hydra_main.py webqsp --no-web --depth 3
Resources
BibTeX
@misc{tan2025hydra,
title={Hydra: Structured Cross-Source Enhanced Large Language Model Reasoning},
author={Xingyu Tan and Xiaoyang Wang and Qing Liu and Xiwei Xu and Xin Yuan and Liming Zhu and Wenjie Zhang},
year={2025},
eprint={2505.17464},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.17464},
}