Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES|QL Reranker command #123074

Open
wants to merge 77 commits into
base: main
Choose a base branch
from
Open

Conversation

afoucret
Copy link
Contributor

@afoucret afoucret commented Feb 20, 2025

Implementation of the RERANK command

Command syntax

| RERANK "query text" ON title,description WITH `reranker-inference-id`

where

  • query text is a constant string (can use a param instead ?queryText)
  • ON clause contains one or several field separated by comma
    • it is also possible to rename a field here (eg ON title, description=overview)
    • it is also possible to create computed fields here (eg ON title, short_description=SUBSTRING(description,0 100))
  • WITH the name of inference endpoint to be used for reranking. The inference endpoint task type has to be rerank

Done:

  • Grammar and parsing of the command
  • Logical plan
  • Pre-analysis and Analysis
    • Detect missing inference endpoint
    • Ensure all expression are resolved
  • Logical plan optimization
    • Default limits and sort order for the rerank command
  • Physical plan and planing
  • Reranker async operator

Follow-up:

@afoucret afoucret force-pushed the esql-reranker-boostrap branch from 9d2471e to fae22df Compare February 21, 2025 13:42
@afoucret afoucret mentioned this pull request Mar 7, 2025
11 tasks
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a first look, left some questions.

// Ensure the score attribute is present in the output.
if (rerank.scoreAttribute() instanceof UnresolvedAttribute ua) {
Attribute resolved = resolveAttribute(ua, childrenOutput);
if (resolved.resolved() == false || resolved.dataType() != DOUBLE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?
If it really is, why not simply expect it enabled in FROM? There's no command in ESQL that "produces" attributes implicitely. Furthermore, if the user defines a _score already, it'll get overwriten silently, which can be unexpected.

If it's not, we might not need this method at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RERANK command overwrite the _score column and sort the result by _score DESC, so having the _score attribute is mandatory.

The first version, was throwing an error unless the user was enabling the attribute in the FROM clause but it was asked that the add the _score if it is missing so the user doo not have to worry about it.

See @ioanatia comment here: #123074 (comment)

return plan.withInferenceResolutionError(inferenceId, error);
} else {
String error = context.inferenceResolution().getError(inferenceId);
return plan.withInferenceResolutionError(inferenceId, error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is globally the role of the Analyzer to resolve entities

But which entities are resolved here? It's just verifications, the plan is returned unchanged in the "good" case, or marked with errors if not. Which could be implemented in the Rerank if it implemented the if'aces called in the verifier?

@elasticsearchmachine
Copy link
Collaborator

Hi @afoucret, I've updated the changelog YAML for you.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @afoucret! I added a couple of comments that I can think of so far.

return new Literal(source(ctx), visitIdentifier(ctx.identifier()), KEYWORD);
}

if (expression(ctx.parameter()) instanceof Literal literalParam) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some tests to provide the inference id as named parameter in an ES|QL query request? Here we expect inferenceId to be a Literal, however the input is an IdentifierOrParameterContext, just to validate whether the inference id can be provided through named parameters properly. There are some named parameter related tests can be used as references, 10_basic.yml, RestEsqlTestCase.testDoubleParamsForIdentifiers.

Thanks for adding named parameters related tests in StatementParserTests, it will be great to have some yml or Rest tests as well, as they cover the parsing from the params specified in a request, and this is how the users usually use it.

// This makes the output more predictable which is helpful here.


reranker using a single field
Copy link
Member

@fang-xing-esql fang-xing-esql Mar 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RERANK implies sort descending on score, it will be nice to cover some queries that have explicit sort on different keys to show how it behaves. For example

FROM books METADATA _score
| WHERE title:"war and peace" AND author:"Tolstoy"
| RERANK "war and peace" ON title WITH test_reranker
| Sort author, title

And do we expect RERANK to work with the other search commands like FORK, RRF, or multiple RERANKs in the same query? From query syntax/grammar's perspective, these are possible, however I don't know if they make senses(from search's perspective), it will be nice to cover some combinations of these commands, whether they are valid queries or not.

Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some comments from @fang-xing-esql regarding testing that I think we should address before merging. looks good otherwise.

@@ -230,7 +244,7 @@ private PhysicalPlan mapMerge(Merge merge) {
}

public static boolean isPipelineBreaker(LogicalPlan p) {
return p instanceof Aggregate || p instanceof TopN || p instanceof Limit || p instanceof OrderBy;
return p instanceof Aggregate || p instanceof TopN || p instanceof Limit || p instanceof OrderBy || p instanceof Rerank;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for me to understand:

  • why did we make Rerank a pipeline breaker? can we not execute the RerankOperator on the data nodes? we are applying a sort immediately after so it might need to be a pipeline breaker?
  • if Rerank is a pipeline breaker, does that mean CCS should be supported? but just have not tested it yet?

}

public void testRerankWithNamedParameters() {
assumeTrue("FORK requires corresponding capability", EsqlCapabilities.Cap.RERANK.isEnabled());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here - it's not FORK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Search Relevance/Ranking Scoring, rescoring, rank evaluation. Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-release Trigger CI checks against release build v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants