Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: InlineJoin (used for INLINESTATS) breaks a planning invariant #124752

Open
alex-spies opened this issue Mar 13, 2025 · 1 comment
Open

ESQL: InlineJoin (used for INLINESTATS) breaks a planning invariant #124752

alex-spies opened this issue Mar 13, 2025 · 1 comment
Labels
:Analytics/ES|QL AKA ESQL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt

Comments

@alex-spies
Copy link
Contributor

C.f. original comment here: #123589 (comment)

For LOOKUP JOIN and other join types, we could safely assume that the attributes present in one join child are completely disjoint from the set of attributes ever present in the other child. We could rely on this fact e.g. for the correctness of the PruneColumns optimizer rule (c.f.

// Note: It is NOT required to do anything special for binary plans like JOINs. It is perfectly fine that transformDown descends
// first into the left side, adding all kinds of attributes to the `used` set, and then descends into the right side - even
// though the `used` set will contain stuff only used in the left hand side. That's because any attribute that is used in the
// left hand side must have been created in the left side as well. Even field attributes belonging to the same index fields will
// have different name ids in the left and right hand sides - as in the extreme example
// `FROM lookup_idx | LOOKUP JOIN lookup_idx ON key_field`.
).

InlineJoin breaks with this assumption, because it specifically references attributes from the left child in the right child.

I think we should decide if we should enforce this assumption; it would much simplify assumptions needed to make to reason about optimizer rules, what happens in case of multiple Joins with the same right hand side etc. (LOOKUP JOIN specifically generates different attributes even if the very same LOOKUP JOIN command is used multiple times to avoid bugs and problems.) On the flip side, enforcing this assumption would require slightly re-modeling InlineJoin. Maybe there's also a middle ground where attributes can only be generated in one child of a join but can be referenced from both (that's currently the case, but it's harder to enforce.)

Depending on the decision, we'll need to:

  • If we don't want to enforce this assumption, or not to the full extent, double check all optimizer rules for compatibility with binary plans.
  • Otherwise, re-model InlineJoin and StubRelation to comply with the assumption.
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 13, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >tech debt
Projects
None yet
Development

No branches or pull requests

2 participants