Transforms

pydantic model retrieval_qa_benchmark.transforms.AgentRouter

Agent Routing with LangChain MRKL Agent Prompts

Fields:
  • children (List[retrieval_qa_benchmark.schema.transform.BaseTransform | None])

  • format_instructions (str)

  • llm_model (Dict[str, Any])

  • prefix (str)

  • record_template (str)

  • suffix (str)

  • verbose (bool)

field format_instructions: str = 'Use the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question'

Instruction to teach LLM what is the output format

field llm_model: Dict[str, Any] [Required]

model configuration for transforms with LLM

field prefix: str = 'Answer the following questions as best you can. You have access to the following tools:'

Template prefix for agent

field record_template: str = '{question}\n{choices}'

Template to format records

field suffix: str = 'Begin!\n\nQuestion: {input}\nThought:{agent_scratchpad}'

Template suffix for agent

field verbose: bool = False

If true, then agent will print output to stdout

build_agent_template() str
chain(**kwargs: Any) Any
execute_action(record: QARecord) Tuple[str, int, int]

execute action for agent components

Parameters:

record (QARecord) – data record to be processed

Returns:

(generated file, number of prompt tokens, number of generated tokens)

Return type:

Tuple[str, int, int]

format_agent_template(q: str, stacked: List[str]) str
get_next_state(generated: str) Tuple[BaseTransform | None, str]
parse_extra(generate: str) ToolHistory | None
set_children(children: List[BaseTransform | None]) None

Set children for transform

Parameters:

children (List[Union[BaseTransform, None]]) – next nodes to execute

pydantic model retrieval_qa_benchmark.transforms.ContextWithElasticBM25
Fields:
  • children (List[retrieval_qa_benchmark.schema.transform.BaseTransform | None])

  • context_template (str)

  • dataset_name (Sequence[str])

  • el_auth (Tuple[str, str])

  • el_host (str)

  • num_selected (int)

  • sep_chr (str)

field context_template: str = '{title} | {paragraph}'
field dataset_name: Sequence[str] = ['Cohere/wikipedia-22-12-en-embeddings']
field el_auth: Tuple[str, str] [Required]
field el_host: str [Required]
field num_selected: int = 5
field sep_chr: str = '\n'
chain(**kwargs: Any) Any
preproc_question4query(data: Dict[str, Any]) str
transform_context(data: Dict[str, Any], **params: Any) List[str]
pydantic model retrieval_qa_benchmark.transforms.ContextWithFaiss

_summary_

Inherited-members:

Parameters:

BaseContextTransform (_type_) – _description_

Fields:
  • children (List[retrieval_qa_benchmark.schema.transform.BaseTransform | None])

  • context_template (str)

  • dataset_name (Sequence[str])

  • embedding_name (str)

  • index_path (str)

  • nprobe (int)

  • num_selected (int)

  • sep_chr (str)

field context_template: str = '{title} | {paragraph}'
field dataset_name: Sequence[str] = ['Cohere/wikipedia-22-12-en-embeddings']
field embedding_name: str = 'paraphrase-multilingual-mpnet-base-v2'
field index_path: str = 'data/indexes/Cohere_mpnet/IVFSQ_L2.index'
field nprobe: int = 128
field num_selected: int = 5
field sep_chr: str = '\n'
chain(**kwargs: Any) Any
preproc_question4query(data: Dict[str, Any]) str
transform_context(data: Dict[str, Any], **params: Any) List[str]
pydantic model retrieval_qa_benchmark.transforms.ContextWithFaissESHybrid

_summary_

Inherited-members:

Parameters:

BaseContextTransform (_type_) – _description_

Fields:
  • children (List[retrieval_qa_benchmark.schema.transform.BaseTransform | None])

  • context_template (str)

  • dataset_name (Sequence[str])

  • el_auth (Tuple[str, str])

  • el_host (str)

  • embedding_name (str)

  • index_path (str)

  • is_raw_rank (bool)

  • nprobe (int)

  • num_filtered (int)

  • num_selected (int)

  • sep_chr (str)

field context_template: str = '{title} | {paragraph}'
field dataset_name: Sequence[str] = ['Cohere/wikipedia-22-12-en-embeddings']
field el_auth: Tuple[str, str] [Required]
field el_host: str [Required]
field embedding_name: str = 'paraphrase-multilingual-mpnet-base-v2'
field index_path: str = 'data/indexes/Cohere_mpnet/IVFSQ_L2.index'
field is_raw_rank: bool = True
field nprobe: int = 128
field num_filtered: int = 100
field num_selected: int = 5
field sep_chr: str = '\n'
chain(**kwargs: Any) Any
preproc_question4query(data: Dict[str, Any]) str
transform_context(data: Dict[str, Any], **params: Any) List[str]
pydantic model retrieval_qa_benchmark.transforms.ContextWithRRFHybrid
Fields:
  • children (List[retrieval_qa_benchmark.schema.transform.BaseTransform | None])

  • context_template (str)

  • num_selected (int)

  • rank_dict (dict)

  • sep_chr (str)

  • with_title (int)

field context_template: str = '{title} | {paragraph}'
field num_selected: int = 5
field rank_dict: dict = {'bm25': 40, 'mpnet': 30}
field sep_chr: str = '\n'
field with_title: int = True
chain(**kwargs: Any) Any
preproc_question4query(data: Dict[str, Any]) str
transform_context(data: Dict[str, Any], **params: Any) List[str]
pydantic model retrieval_qa_benchmark.transforms.LangChainInfoSQLDB
Fields:
  • children (List[retrieval_qa_benchmark.schema.transform.BaseTransform | None])

  • descrption (str)

  • name (str)

  • url (str)

  • verbose (bool)

field descrption: str = "Input to this tool is a comma-separated list of tables, output is the schema and sample rows for those tables. Be sure that the tables actually exist by calling sql_db_list_tables first! Example Input: 'table1, table2, table3'"

prompt description to this tool

field name: str = 'sql_db_schema'

name for this tool

field url: str [Required]

URL string to create engines

field verbose: bool = False

If true, then agent will print output to stdout

chain(**kwargs: Any) Any
execute_action(record: QARecord) Tuple[str, int, int]

execute action for agent components

Parameters:

record (QARecord) – data record to be processed

Returns:

(generated file, number of prompt tokens, number of generated tokens)

Return type:

Tuple[str, int, int]

get_next_state(generate: str) Tuple[BaseTransform | None, str]
parse_extra(generate: str) ToolHistory | None
set_children(children: List[BaseTransform | None]) None

Set children for transform

Parameters:

children (List[Union[BaseTransform, None]]) – next nodes to execute

pydantic model retrieval_qa_benchmark.transforms.LangChainListSQLDB
Fields:
  • children (List[retrieval_qa_benchmark.schema.transform.BaseTransform | None])

  • descrption (str)

  • name (str)

  • url (str)

  • verbose (bool)

field descrption: str = 'Input is an empty string, output is a comma separated list of tables in the database.'

prompt description to this tool

field name: str = 'sql_db_list_tables'

name for this tool

field url: str [Required]

URL string to create engines

field verbose: bool = False

If true, then agent will print output to stdout

chain(**kwargs: Any) Any
execute_action(record: QARecord) Tuple[str, int, int]

execute action for agent components

Parameters:

record (QARecord) – data record to be processed

Returns:

(generated file, number of prompt tokens, number of generated tokens)

Return type:

Tuple[str, int, int]

get_next_state(generate: str) Tuple[BaseTransform | None, str]
parse_extra(generate: str) ToolHistory | None
set_children(children: List[BaseTransform | None]) None

Set children for transform

Parameters:

children (List[Union[BaseTransform, None]]) – next nodes to execute

pydantic model retrieval_qa_benchmark.transforms.LangChainQuerySQLDB
Fields:
  • children (List[retrieval_qa_benchmark.schema.transform.BaseTransform | None])

  • descrption (str)

  • name (str)

  • url (str)

  • verbose (bool)

field descrption: str = "Input to this tool is a detailed and correct SQL query, output is a result from the database. If the query is not correct, an error message will be returned. If an error is returned, rewrite the query, check the query, and try again. If you encounter an issue with Unknown column 'xxxx' in 'field list', using sql_db_schema to query the correct table fields."

prompt description to this tool

field name: str = 'sql_db_query'

name for this tool

field url: str [Required]

URL string to create engines

field verbose: bool = False

If true, then agent will print output to stdout

chain(**kwargs: Any) Any
execute_action(record: QARecord) Tuple[str, int, int]

execute action for agent components

Parameters:

record (QARecord) – data record to be processed

Returns:

(generated file, number of prompt tokens, number of generated tokens)

Return type:

Tuple[str, int, int]

get_next_state(generate: str) Tuple[BaseTransform | None, str]
parse_extra(generate: str) ToolHistory | None
set_children(children: List[BaseTransform | None]) None

Set children for transform

Parameters:

children (List[Union[BaseTransform, None]]) – next nodes to execute

pydantic model retrieval_qa_benchmark.transforms.LangChainSQLAgentRouter

Agent Decision with LangChain SQL Agent Prompts

Fields:
  • children ()

  • format_instructions (str)

  • llm_model ()

  • prefix (str)

  • record_template ()

  • sql_dialect (str)

  • sql_topk (int)

  • suffix (str)

  • verbose ()

field children: List[BaseTransform | None] = [None, None]

list of next status

field format_instructions: str = 'Use the following format:\n\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\nThought: I now know the final answer\nFinal Answer: the final answer to the original input question'

Instruction to teach LLM what is the output format

field llm_model: Dict[str, Any] [Required]

model configuration for transforms with LLM

field prefix: str = 'You are an agent designed to interact with a SQL database.\nGiven an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.\nUnless the user specifies a specific number of examples they wish to obtain, always limit your query to at most {top_k} results.\nYou can order the results by a relevant column to return the most interesting examples in the database.\nNever query for all the columns from a specific table, only ask for the relevant columns given the question.\nYou have access to tools for interacting with the database.\nOnly use the below tools. Only use the information returned by the below tools to construct your final answer.\nYou MUST double check your query before executing it. If you get an error while executing a query, rewrite the query and try again.\n\nDO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.\n\nIf the question does not seem related to the database, just return "I don\'t know" as the answer.\n'

Template prefix for agent

field record_template: str = '{question}\n{choices}'

Template to format records

field sql_dialect: str = 'SQL'

SQL dialect that helps the LLM understand which SQL its working on

field sql_topk: int = 5

Maximum retrieved context from database

field suffix: str = 'Begin!\n\nQuestion: {input}\nThought: I should look at the tables in the database to see what I can query.  Then I should query the schema of the most relevant tables.\n{agent_scratchpad}'

Template suffix for agent

field verbose: bool = False

If true, then agent will print output to stdout

build_agent_template() str
chain(**kwargs: Any) Any
execute_action(record: QARecord) Tuple[str, int, int]

execute action for agent components

Parameters:

record (QARecord) – data record to be processed

Returns:

(generated file, number of prompt tokens, number of generated tokens)

Return type:

Tuple[str, int, int]

format_agent_template(q: str, stacked: List[str]) str
get_next_state(generated: str) Tuple[BaseTransform | None, str]
parse_extra(generate: str) ToolHistory | None
set_children(children: List[BaseTransform | None]) None

Set children for transform

Parameters:

children (List[Union[BaseTransform, None]]) – next nodes to execute

pydantic model retrieval_qa_benchmark.transforms.LangChainSQLChecker
Fields:
  • checker_prompt (str)

  • children (List[retrieval_qa_benchmark.schema.transform.BaseTransform | None])

  • descrption (str)

  • llm_model (Dict[str, Any])

  • name (str)

  • sql_dialect (str)

  • url (str)

  • verbose (bool)

field checker_prompt: str = '\n{query}\nDouble check the {dialect} query above for common mistakes, including:\n- Using NOT IN with NULL values\n- Using UNION when UNION ALL should have been used\n- Using BETWEEN for exclusive ranges\n- Data type mismatch in predicates\n- Properly quoting identifiers\n- Using the correct number of arguments for functions\n- Casting to the correct data type\n- Using the proper columns for joins\n\nIf there are any of the above mistakes, rewrite the query. If there are no mistakes, just reproduce the original query.\n\nOutput the final SQL query only.\n\nSQL Query: '
field descrption: str = 'Use this tool to double check if your query is correct before executing it. Always use this tool before executing a query with sql_db_query!'

prompt description to this tool

field llm_model: Dict[str, Any] [Required]

model configuration for transforms with LLM

field name: str = 'sql_db_query_checker'

name for this tool

field sql_dialect: str = 'SQL'
field url: str [Required]

URL string to create engines

field verbose: bool = False

If true, then agent will print output to stdout

chain(**kwargs: Any) Any
execute_action(record: QARecord) Tuple[str, int, int]

execute action for agent components

Parameters:

record (QARecord) – data record to be processed

Returns:

(generated file, number of prompt tokens, number of generated tokens)

Return type:

Tuple[str, int, int]

get_next_state(generate: str) Tuple[BaseTransform | None, str]
parse_extra(generate: str) ToolHistory | None
set_children(children: List[BaseTransform | None]) None

Set children for transform

Parameters:

children (List[Union[BaseTransform, None]]) – next nodes to execute