Searchers
- pydantic model retrieval_qa_benchmark.transforms.searchers.ElSearchSearcher
Elastic searcher
- Fields:
dataset_name (Sequence[str])dataset_split (str)el_auth (Tuple[str, str])el_host (str)template (str)text_preprocess (Callable)
- field dataset_name: Sequence[str] = ['Cohere/wikipedia-22-12-en-embeddings']
dataset name for plugin dataset
- field dataset_split: str = 'train'
split for that dataset
- field el_auth: Tuple[str, str] [Required]
auth tuple for elastic search
- field el_host: str [Required]
hostname to elastic search backend
- field text_preprocess: Callable = <function text_preprocess>
- bm25_filter(query_list: List[str], num_selected: int) Tuple[List[List[float]], List[List[Entry]]]
BM25 search
- Parameters:
query_list (List[str]) – list of queries
num_selected (int) – number of returned context
- Returns:
distances and entries
- Return type:
Tuple[List[List[float]], List[List[Entry]]]
- para_id_list_to_entry(para_id_list: List[List[int]]) List[List[Entry]]
parse paragraph ID list into list of entry
- Parameters:
para_id_list (List[List[int]]) – paragraph ids
- Returns:
list of entry
- Return type:
List[List[Entry]]
- para_id_to_entry(para_id: int, start_para_list: List[int] | None) Tuple[str, str]
parse paragraph ID into
Entry- Parameters:
para_id (int) – paragraph ID (row position)
start_para_list (Optional[List[int]]) – List of start paragraph
- Returns:
title and paragraph
- Return type:
Tuple[str, str]
- search(query_list: list, num_selected: int, context: List[List[str]] | None = None) Tuple[List[List[float]], List[List[Entry]]]
search interface to for every
BaseSearcher
- pydantic model retrieval_qa_benchmark.transforms.searchers.FaissElSearchBM25HybridSearcher
- Fields:
dataset_name (Sequence[str])dataset_split (str)el_auth (Tuple[str, str])el_host (str)embedding_name (str)index_path (str)is_raw_rank (bool)nprobe (int)num_filtered (int)template (str)
- field dataset_name: Sequence[str] = ['Cohere/wikipedia-22-12-en-embeddings']
dataset name for plugin dataset
- field dataset_split: str = 'train'
split for that dataset
- field el_auth: Tuple[str, str] [Required]
- field el_host: str [Required]
- field embedding_name: str [Required]
- field index_path: str [Required]
- field is_raw_rank: bool [Required]
- field nprobe: int = 128
- field num_filtered: int [Required]
- bm25_filter(**kwargs: Any) Any
- emb_filter(**kwargs: Any) Any
- faiss_bm25_hybrid_filter(query_list: List[str], num_selected: int, num_filtered: int, is_raw_rank: bool) Tuple[List[List[float]], List[List[Entry]]]
- index_search(**kwargs: Any) Any
- para_id_list_to_entry(para_id_list: List[List[int]]) List[List[Entry]]
parse paragraph ID list into list of entry
- Parameters:
para_id_list (List[List[int]]) – paragraph ids
- Returns:
list of entry
- Return type:
List[List[Entry]]
- para_id_to_entry(para_id: int, start_para_list: List[int] | None) Tuple[str, str]
parse paragraph ID into
Entry- Parameters:
para_id (int) – paragraph ID (row position)
start_para_list (Optional[List[int]]) – List of start paragraph
- Returns:
title and paragraph
- Return type:
Tuple[str, str]
- search(query_list: list, num_selected: int, context: List[List[str]] | None = None) Tuple[List[List[float]], List[List[Entry]]]
search interface to for every
BaseSearcher
- pydantic model retrieval_qa_benchmark.transforms.searchers.FaissElSearchBM25UnionSearcher
- Fields:
dataset_name (Sequence[str])dataset_split (str)el_auth (Tuple[str, str])el_host (str)embedding_name (str)index_path (str)nprobe (int)template (str)text_preprocess (Callable)
- field dataset_name: Sequence[str] = ['Cohere/wikipedia-22-12-en-embeddings']
dataset name for plugin dataset
- field dataset_split: str = 'train'
split for that dataset
- field el_auth: Tuple[str, str] [Required]
- field el_host: str [Required]
- field embedding_name: str [Required]
- field index_path: str [Required]
- field nprobe: int = 128
- field text_preprocess: Callable = <function text_preprocess>
- bm25_filter(**kwargs: Any) Any
- emb_filter(**kwargs: Any) Any
- faiss_bm25_union_filter(query_list: List[str], num_selected: int) Tuple[List[List[float]], List[List[Entry]]]
- index_search(**kwargs: Any) Any
- para_id_list_to_entry(para_id_list: List[List[int]]) List[List[Entry]]
parse paragraph ID list into list of entry
- Parameters:
para_id_list (List[List[int]]) – paragraph ids
- Returns:
list of entry
- Return type:
List[List[Entry]]
- para_id_to_entry(para_id: int, start_para_list: List[int] | None) Tuple[str, str]
parse paragraph ID into
Entry- Parameters:
para_id (int) – paragraph ID (row position)
start_para_list (Optional[List[int]]) – List of start paragraph
- Returns:
title and paragraph
- Return type:
Tuple[str, str]
- search(query_list: list, num_selected: int, context: List[List[str]] | None = None) Tuple[List[List[float]], List[List[Entry]]]
search interface to for every
BaseSearcher
- pydantic model retrieval_qa_benchmark.transforms.searchers.FaissSearcher
FAISS searcher
- Fields:
dataset_name (Sequence[str])dataset_split (str)embedding_name (str)index_path (str)nprobe (int)template (str)
- field dataset_name: Sequence[str] = ['Cohere/wikipedia-22-12-en-embeddings']
dataset name for plugin dataset
- field dataset_split: str = 'train'
split for that dataset
- field embedding_name: str [Required]
embedding model name
- field index_path: str [Required]
path to faiss dumped index
- field nprobe: int = 128
number of clusters to search for IVF indices
- emb_filter(**kwargs: Any) Any
- index_search(**kwargs: Any) Any
- para_id_list_to_entry(para_id_list: List[List[int]]) List[List[Entry]]
parse paragraph ID list into list of entry
- Parameters:
para_id_list (List[List[int]]) – paragraph ids
- Returns:
list of entry
- Return type:
List[List[Entry]]
- para_id_to_entry(para_id: int, start_para_list: List[int] | None) Tuple[str, str]
parse paragraph ID into
Entry- Parameters:
para_id (int) – paragraph ID (row position)
start_para_list (Optional[List[int]]) – List of start paragraph
- Returns:
title and paragraph
- Return type:
Tuple[str, str]
- search(query_list: list, num_selected: int, context: List[List[str]] | None = None) Tuple[List[List[float]], List[List[Entry]]]
search interface to for every
BaseSearcher
- pydantic model retrieval_qa_benchmark.transforms.searchers.MyScaleSearcher
MyScale Searcher
- Fields:
embedding_name (str)host (str)kw_topk (int)num_filtered (int)password (str)port (int)table_name (str)template (str)two_staged (bool)username (str)
- field embedding_name: str [Required]
embedding model name
- field host: str [Required]
hostname to MyScale backend
- field kw_topk: int = 10
keyword extraction only extract
kw_topkkeywords
- field num_filtered: int = 100
number sample returned in first stage filter. Does not matter if two_staged is False
- field password: str = ''
password to connect MyScale
- field port: int [Required]
port to MyScale backend
- field table_name: str = 'Wikipedia'
table name to search on
- field two_staged: bool = False
If twostaged search (with keyword) is enabled
- field username: str = 'default'
user name to connect MyScale
- retrieve(**kwargs: Any) Any
- search(**kwargs: Any) Any
search interface to for every
BaseSearcher