You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bug] Retriever Tool Ignores filters Argument in Hybrid Search When Called by an Agent
Description
When an agent is configured to use a retriever tool (e.g., a Pinecone or other vector store retriever), and the agent correctly generates a tool call with both a query (for semantic search) and a filters object (for metadata filtering), the retriever node appears to execute the semantic search but completely ignores the filters argument.
This prevents the implementation of a proper hybrid search strategy, as the results returned are not filtered by the specified metadata, leading to irrelevant context being passed back to the agent.
Steps to Reproduce
Set up a Vector Store: Populate a vector store (e.g., Pinecone) with documents containing specific, filterable metadata. For example, documents with an integer field assignment_week.
Configure Retriever Tool: In Flowise, create a "Retriever Tool" chain that connects to the vector store.
Configure Agent: Create an agent using a tool-calling model (e.g., OpenAI, Anthropic) and provide it with the retriever tool created in the previous step.
Invoke the Agent: Provide a prompt that causes the agent to generate a tool call with both a query string and a filters object. The agent correctly formulates the call as intended.
Expected Behavior
The retriever tool should honor both arguments. It should perform a semantic search on the query string AND apply the metadata filters to the search. The final result set should only contain documents that match both the semantic meaning and the exact metadata criteria.
In the example above, the retriever should have returned only documents where the metadata field assignment_week is equal to 1.
Actual Behavior
The retriever tool ignores the filters object entirely. It returns a list of documents based only on the semantic similarity of the query string.
As shown in the attached screenshot, the tool call explicitly filtered for assignment_week: 1, but the results included documents from "Week 14" and "Week 2", clearly violating the filter.
Supporting Evidence
The screenshot below demonstrates the issue. The top section shows the structured tool call generated by the agent, which correctly includes the filters object. The bottom section shows the tool's output, which contains results that do not match the specified filters.
Impact
This bug is critical as it prevents the creation of robust and accurate RAG agents that need to handle queries with both conceptual and specific, structured components (e.g., "instructions for week 2"). Without the ability to filter by metadata at the retriever level, the accuracy of the agent is significantly compromised, as it receives irrelevant context for its tasks. This forces users to implement less efficient workarounds like post-retrieval filtering in the application logic.
The agent is failing with an error, "Received tool input did not match expected schema", because the JSON object it generates for the tool call does not conform to the structure defined in its instructions.
Core Problem: Schema Mismatch
The fundamental issue is a structural mismatch between the agent's output and the tool's expected input format.
Expected Structure (from System Prompt): The agent's instructions explicitly demand a tool call with a top-level key named arguments. This object should directly contain query and filters keys.
Actual Structure (from Error Screenshot): The agent is generating a different structure. It creates a top-level key named toolInput, which contains another nested object called input, which finally holds the query and filters.
The tool is correctly rejecting this input because it is expecting an arguments object, not a toolInput object with a nested input key.
This error is the latest version of a recurring problem where the agent fails to correctly structure its tool call arguments. Previous bug reports indicate: Stringified JSON Bug: The agent was previously passing the entire arguments object as a single, stringified JSON blob instead of a proper JSON object.
This caused the retriever's parser to fail, successfully reading only the first filter ("assignment_type":"individual") and ignoring the integer-based filters that followed. This resulted in Pinecone returning all documents that matched the one successful filter, ignoring the others. The suspected cause was the tool's schema in the application code defining the arguments parameter as a string instead of an object.
Ignored Filters Bug: In another instance, the agent again failed to apply filters. In that case, the agent formatted the arguments into a single string value under an "input" key, like "input": "query: "..."; filters: {"assignment_week": 1, ...}". This caused the retriever to execute the semantic search but ignore the filters entirely.
Despite explicit instructions in the current system prompt to ensure the arguments parameter is a valid JSON object and not a string, the agent is still failing to produce the correct structure.
Description:
The retriever tool incorrectly returns multiple documents despite receiving precise filters. While the correct document is included, it is bundled with other documents that do not match the filter criteria, causing data noise and inefficient processing downstream.
Steps to Reproduce:
Call the ba579_assignment_instructions_retriever tool.
Provide a JSON payload with specific filters, e.g.:
{"filters": {"assignment_week": "one", "assignment_in_week": "two"}}
Execute the call and observe the output.
Expected Result:
The tool's output should contain ONLY the text for "Week 1 Assignment 2" as specified by the filters.
Actual Result:
The output contains the correct text for "Week 1 Assignment 2" concatenated with the text for "Week 11 Team Assignment 2" and "Week 13 Team Assignment 1".
Analysis:
The hybrid search is likely overweighting semantic relevance and not treating the provided filters as strict, mandatory constraints. The filtering logic should be hardened to act as a prerequisite for the search (a "hard filter") rather than just a ranking signal (a "soft filter"). The current behavior negates the purpose of providing precise filters.
Title: Retrieval Node Fails to Apply Strict Metadata Filters in Hybrid Search, Returning Irrelevant Documents
1. Description:
The Hybrid Search Retrieval Node is not correctly implementing its core filtering logic. Despite receiving a perfectly structured JSON input with specific metadata filters, the node's output includes multiple documents that do not match the provided filter criteria.
The node's system prompt explicitly instructs it to use the filters object as a "mandatory, exact-match" constraint (i.e., a hard filter) before performing the semantic search. The observed behavior indicates the node's underlying code is ignoring this instruction, likely using the filters as a "soft" relevance signal instead. This defeats the purpose of the preceding agent's precise JSON generation and makes the retrieval process unreliable.
2. Steps to Reproduce:
An upstream agent processes a user query ("Tell me about week 1 personal assignment 2.") and correctly generates the following JSON object.
Provide this exact JSON object as input to the Hybrid Search Retrieval Node:
The retrieval node should return only one document chunk: the one whose metadata perfectly matches {"assignment_week": "one", "assignment_in_week": "two", "assignment_type": "personal"}. All other documents must be excluded.
Incorrect: document_title: "Week 7 Team Assignment 1" (Does not match assignment_week or assignment_type)
Incorrect: document_title: "Week 8 Team Assignment 2" (Does not match assignment_week or assignment_type)
5. Root Cause Analysis:
The issue is not with the prompts or the input data, but with the retrieval node's hard-coded search methodology. The behavior is consistent with a "soft filter" or "boosting" implementation.
Instead of first querying the database for WHERE assignment_week = 'one', the tool is performing a broad semantic search for "official assignment guidelines" and then giving a relevance boost to results that happen to match the filter metadata. This allows semantically similar but factually incorrect documents to contaminate the results.
6. Suggested Fix:
The retrieval logic in the node's source code must be modified to enforce strict pre-filtering. The query process should be re-architected to:
First, execute a query against the document store that exclusively selects documents based on the mandatory, exact-match filters provided in the JSON input.
Second, perform the semantic vector search using the query string only on the small, pre-filtered set of documents returned from the first step.
This change will ensure the filters act as a non-negotiable prerequisite, guaranteeing the relevance and accuracy of the retrieval process.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
[Bug] Retriever Tool Ignores filters Argument in Hybrid Search When Called by an Agent
Description
When an agent is configured to use a retriever tool (e.g., a Pinecone or other vector store retriever), and the agent correctly generates a tool call with both a query (for semantic search) and a filters object (for metadata filtering), the retriever node appears to execute the semantic search but completely ignores the filters argument.
This prevents the implementation of a proper hybrid search strategy, as the results returned are not filtered by the specified metadata, leading to irrelevant context being passed back to the agent.
Steps to Reproduce
Set up a Vector Store: Populate a vector store (e.g., Pinecone) with documents containing specific, filterable metadata. For example, documents with an integer field assignment_week.
Configure Retriever Tool: In Flowise, create a "Retriever Tool" chain that connects to the vector store.
Configure Agent: Create an agent using a tool-calling model (e.g., OpenAI, Anthropic) and provide it with the retriever tool created in the previous step.
Invoke the Agent: Provide a prompt that causes the agent to generate a tool call with both a query string and a filters object. The agent correctly formulates the call as intended.
Example Tool Call (as seen in the logs)
JSON
{
"name": "ba579_assignment_instructions",
"args": {
"input": "query: "official assignment guidelines and instructions"; filters: {"assignment_week": 1, "assignment_in_week": 2, "assignment_type": "individual"}"
},
"id": "call_4b9prdyvdIAy8rp4MjfLXePe",
"type": "tool_call"
}
Expected Behavior
The retriever tool should honor both arguments. It should perform a semantic search on the query string AND apply the metadata filters to the search. The final result set should only contain documents that match both the semantic meaning and the exact metadata criteria.
In the example above, the retriever should have returned only documents where the metadata field assignment_week is equal to 1.
Actual Behavior
The retriever tool ignores the filters object entirely. It returns a list of documents based only on the semantic similarity of the query string.
As shown in the attached screenshot, the tool call explicitly filtered for assignment_week: 1, but the results included documents from "Week 14" and "Week 2", clearly violating the filter.
Supporting Evidence
The screenshot below demonstrates the issue. The top section shows the structured tool call generated by the agent, which correctly includes the filters object. The bottom section shows the tool's output, which contains results that do not match the specified filters.
Impact
This bug is critical as it prevents the creation of robust and accurate RAG agents that need to handle queries with both conceptual and specific, structured components (e.g., "instructions for week 2"). Without the ability to filter by metadata at the retriever level, the accuracy of the agent is significantly compromised, as it receives irrelevant context for its tasks. This forces users to implement less efficient workarounds like post-retrieval filtering in the application logic.
Environment
Flowise Version: 3.0.5 Agentflow v2
Primary Nodes Used: Agent (e.g., ChatOpenAI with Tools), Retriever Tool, Pinecone Retriever
Vector Store: Pinecone
Beta Was this translation helpful? Give feedback.
All reactions