-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL Reranker command #123074
base: main
Are you sure you want to change the base?
ES|QL Reranker command #123074
Conversation
9d2471e
to
fae22df
Compare
…esql-reranker-boostrap
…esql-reranker-boostrap
…esql-reranker-boostrap
…esql-reranker-boostrap
…esql-reranker-boostrap
…esql-reranker-boostrap
…esql-reranker-boostrap
…renceServicePlugin
…esql-reranker-boostrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a first look, left some questions.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/parser/ExpressionBuilder.java
Outdated
Show resolved
Hide resolved
// Ensure the score attribute is present in the output. | ||
if (rerank.scoreAttribute() instanceof UnresolvedAttribute ua) { | ||
Attribute resolved = resolveAttribute(ua, childrenOutput); | ||
if (resolved.resolved() == false || resolved.dataType() != DOUBLE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
If it really is, why not simply expect it enabled in FROM? There's no command in ESQL that "produces" attributes implicitely. Furthermore, if the user defines a _score already, it'll get overwriten silently, which can be unexpected.
If it's not, we might not need this method at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RERANK
command overwrite the _score
column and sort the result by _score DESC
, so having the _score
attribute is mandatory.
The first version, was throwing an error unless the user was enabling the attribute in the FROM
clause but it was asked that the add the _score
if it is missing so the user doo not have to worry about it.
See @ioanatia comment here: #123074 (comment)
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/ResolvedInference.java
Outdated
Show resolved
Hide resolved
return plan.withInferenceResolutionError(inferenceId, error); | ||
} else { | ||
String error = context.inferenceResolution().getError(inferenceId); | ||
return plan.withInferenceResolutionError(inferenceId, error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is globally the role of the Analyzer to resolve entities
But which entities are resolved here? It's just verifications, the plan is returned unchanged in the "good" case, or marked with errors if not. Which could be implemented in the Rerank
if it implemented the if'aces called in the verifier?
Hi @afoucret, I've updated the changelog YAML for you. |
…esql-reranker-boostrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @afoucret! I added a couple of comments that I can think of so far.
return new Literal(source(ctx), visitIdentifier(ctx.identifier()), KEYWORD); | ||
} | ||
|
||
if (expression(ctx.parameter()) instanceof Literal literalParam) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some tests to provide the inference id
as named parameter in an ES|QL query request? Here we expect inferenceId to be a Literal
, however the input is an IdentifierOrParameterContext
, just to validate whether the inference id
can be provided through named parameters properly. There are some named parameter related tests can be used as references, 10_basic.yml, RestEsqlTestCase.testDoubleParamsForIdentifiers.
Thanks for adding named parameters related tests in StatementParserTests
, it will be great to have some yml or Rest tests as well, as they cover the parsing from the params
specified in a request, and this is how the users usually use it.
// This makes the output more predictable which is helpful here. | ||
|
||
|
||
reranker using a single field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RERANK
implies sort descending on score
, it will be nice to cover some queries that have explicit sort on different keys to show how it behaves. For example
FROM books METADATA _score
| WHERE title:"war and peace" AND author:"Tolstoy"
| RERANK "war and peace" ON title WITH test_reranker
| Sort author, title
And do we expect RERANK
to work with the other search commands like FORK
, RRF
, or multiple RERANK
s in the same query? From query syntax/grammar's perspective, these are possible, however I don't know if they make senses(from search's perspective), it will be nice to cover some combinations of these commands, whether they are valid queries or not.
…esql-reranker-boostrap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some comments from @fang-xing-esql regarding testing that I think we should address before merging. looks good otherwise.
@@ -230,7 +244,7 @@ private PhysicalPlan mapMerge(Merge merge) { | |||
} | |||
|
|||
public static boolean isPipelineBreaker(LogicalPlan p) { | |||
return p instanceof Aggregate || p instanceof TopN || p instanceof Limit || p instanceof OrderBy; | |||
return p instanceof Aggregate || p instanceof TopN || p instanceof Limit || p instanceof OrderBy || p instanceof Rerank; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for me to understand:
- why did we make
Rerank
a pipeline breaker? can we not execute theRerankOperator
on the data nodes? we are applying a sort immediately after so it might need to be a pipeline breaker? - if
Rerank
is a pipeline breaker, does that mean CCS should be supported? but just have not tested it yet?
} | ||
|
||
public void testRerankWithNamedParameters() { | ||
assumeTrue("FORK requires corresponding capability", EsqlCapabilities.Cap.RERANK.isEnabled()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here - it's not FORK.
Implementation of the
RERANK
commandCommand syntax
where
query text
is a constant string (can use a param instead?queryText
)ON
clause contains one or several field separated by commaON title, description=overview
)ON title, short_description=SUBSTRING(description,0 100)
)WITH
the name of inference endpoint to be used for reranking. The inference endpoint task type has to bererank
Done:
Follow-up: