Hydra Logo

Hydra: Structured Cross-Source Enhanced Large Language Model Reasoning

Xingyu Tan 1,2 · Xiaoyang Wang 1 · Qing Liu 2 · Xiwei Xu 2 · Xin Yuan 2 · Liming Zhu 2 · Wenjie Zhang 1

1University of New South Wales · 2Data61, CSIRO

EMNLP2025 (Main Conference)

Abstract

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge. Current hybrid RAG systems retrieve evidence from both knowledge graphs (KGs) and text documents to support LLM reasoning. However, they face challenges including multi-hop reasoning, multi-entity questions, multi-source verification, and effective graph utilisation. Hydra is a training-free framework that unifies graph topology, document semantics, and source reliability to support deep, faithful reasoning. An agent explores structured and unstructured sources, increasing both diversity and precision of evidence, and applies tri-factor verification (trustworthiness, corroboration, entity-path alignment) to reduce noise and improve faithfulness.

KG + text Agent exploration Tri-factor verification Training-free
Hydra overview: agent explores KG + text and applies tri-factor verification. Figure 1: Four LLM reasoning paradigms and Hydra's cross-source workflow.

Method

Overview. Hydra constructs an evidence graph while answering a question. It queries both KG and text, expands along promising relations, applies reliability checks, and keeps verified paths for the final answer.

1) Initialisation

Decompose the question; estimate hop depth; select sources (KG / Wikipedia / web); detect topic entities and form a small query subgraph.

2) Evidence exploration

Initial: agentic selector-chosen sources for high-precision seeds; Refined: add LLM priors + live web to patch gaps; Predicted: test LLM-proposed candidates along KG/Wiki paths.

3) Evidence pruning

Score relevance and agreement, merge near-duplicates, and keep a compact evidence graph with top-ranked paths.

4) Question answering

Use the selected paths as citations for answer generation. Keep slow thinking and deep reasoning.

Hydra pipeline: initialisation → exploration → pruning → answering Figure 2: System pipeline with the four stages used in our implementation.
Figure 3: Overview of the initialization section.

Benchmarks and summary

Hydra reports strong results across seven benchmarks, including CWQ, AdvHotpotQA, QALD10-en, SimpleQA, WebQSP, WebQuestions, and Zero-shot RE. With GPT-3.5 it improves over a strong hybrid baseline and allows smaller models such as Llama-3.1-8B to approach GPT-4-Turbo on these tasks.

Install and run

Hydra relies on Freebase and Wikidata services for the KG side. Please follow the setup guides in the repository and put your API keys where indicated.

# 1) Clone and create environment
git clone https://github.com/SteveTANTAN/HydraRAG.git
cd HydraRAG
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 2) Prepare KG services
# See Freebase/README.md and Wikidata/README.md in the repo

# 3) Run (example)
python hydra_main.py webqsp --depth 3 --allr --model llama70b

# Modules stay active unless explicitly disabled with --no-*
# e.g., ablation without web evidence:
python hydra_main.py webqsp --no-web --depth 3

Resources

BibTeX

@misc{tan2025hydra,
    title={Hydra: Structured Cross-Source Enhanced Large Language Model Reasoning}, 
    author={Xingyu Tan and Xiaoyang Wang and Qing Liu and Xiwei Xu and Xin Yuan and Liming Zhu and Wenjie Zhang},
    year={2025},
    eprint={2505.17464},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2505.17464}, 
}