Write your own pipeline

Add a dataset

  1. Inherit from retrieval_qa_benchmark.schema.BaseDataset

  2. Implement build function that parse your data to list of retrieval_qa_benchmark.schema.QARecord.

  3. Register your dataset with retrieval_qa_benchmark.utils.registry.REGISTRY.register_dataset like this:

@retrieval_qa_benchmark.utils.registry.REGISTRY.register_dataset("type name you want")
class YourDataset(retrieval_qa_benchmark.schema.BaseDataset):
    pass
  1. Create a PR on github.

Add a transform

  1. Inherit from retrieval_qa_benchmark.schema.BaseTransform.

  2. Implement any of
  3. Register your transform with retrieval_qa_benchmark.utils.registry.REGISTRY.register_transform like this:

@retrieval_qa_benchmark.utils.registry.REGISTRY.register_transform("type name you want")
class YourTransform(retrieval_qa_benchmark.schema.BaseTransform):
    pass
  1. Create a PR on github

Add a LLM

  1. Inherit from retrieval_qa_benchmark.schema.BaseLLM

  2. Implement all of below
  3. Register your language model with retrieval_qa_benchmark.utils.registry.REGISTRY.register_model like this:

@retrieval_qa_benchmark.utils.registry.REGISTRY.register_model("type name you want")
class YourLLM(retrieval_qa_benchmark.schema.BaseLLM):
    pass
  1. Create a PR on github

Add a evaluator

  1. Inherit from retrieval_qa_benchmark.schema.BaseEvaluator

  2. Change the matcher function of retrieval_qa_benchmark.schema.BaseEvaluator like this

  3. Register your dataset with retrieval_qa_benchmark.utils.registry.REGISTRY.register_evaluator like this:

@retrieval_qa_benchmark.utils.registry.REGISTRY.register_evaluator("type name you want")
class YourLLM(retrieval_qa_benchmark.schema.BaseLLM):
    pass
  1. Create a PR on github