Welcome to Retrieval QA Benchmark’s documentation!

Retreival QA Benchmark (RQABench in short) is an open-sourced, end-to-end test workbench for Retrieval Augmented Generation (RAG) systems. We intend to build an open benchmark for all developers and researchers to reproduce and design new RAG systems.

The overall data flow will look like this:

There are 3 major modules in a retrieval_qa_benchmark.evaluators.base.BaseEvaluator.

All data flows over modules are retrieval_qa_benchmark.schema.QARecord. So the data schema is constrained instead of the modules. The dataset outputs formatted QARecord to the TransformGraph. Graph can be defined using our YAML configuration. Here is where you can design your retrieval system. You can modify the context field in QARecord objects with in nodes in TransformGraph. Then LLM accepts QARecord and format QARecord using template defined in YAML. Finally, LLM throw a retrieval_qa_benchmark.schema.datatypes.QAPrediction to the Evaluator.

Here are some major feature of this benchmark:

Flexibility: We maximize the flexibility when design your retrieval system, as long as you accept QARecord as input and QARecord as output.
Reproducibility: We gather all settings in the evaluation process into a single YAML configuration. It helps you to track and reproduce experiements.
Traceability: We collect more than the accuracy and scores. We also focus on running times and the tokens used in the whole RAG system.

Welcome to Retrieval QA Benchmark’s documentation!

Table of Content

Indices and tables