Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Reuse child outputSet inside the plan where possible #124611

Merged
merged 6 commits into from
Mar 20, 2025

Conversation

costin
Copy link
Member

@costin costin commented Mar 12, 2025

Avoid creating outputSet between nodes that passthrough their input

Relates #124395

Avoid creating outputSet between nodes that passthrough their input

Relates elastic#124395
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @costin, I've created a changelog YAML for you.

@costin
Copy link
Member Author

costin commented Mar 12, 2025

The initial PR (making AttributeMap/Set immutable) become larger and ended up in chasing subtle bugs with planning - I've extracted this small change as it has a direct impact on queries with many fields.

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @costin, I like the idea and I think it makes a lot of sense.
It only needs a little change to UnaryPlan, see comment below.

Comment on lines 46 to 50
@Override
public AttributeSet outputSet() {
return child().outputSet();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this is incorrect: now all the subplans that override output() but not outputSet() (eg. Eval) return a wrong output set, based on the child rather than on the actual output

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, this is dangerous, we normally don't have to override outputSet but rely it to be constructed from output().

A IMHO better way that remains safe:

    public AttributeSet outputSet() {
        if (lazyOutputSet == null) {
            List<Attribute> output = output();
            lazyOutputSet = (output == child.output?) child.outputSet() : new AttributeSet(output);
        }
        return lazyOutputSet;

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change itself makes sense. I'd like it a little better if the improvements were tracked by benches - I think this would strongly benefit from a micro benchmark (or a nightly!) that demonstrates the effect of this improvement on execution times of our optimizer(s).

Comment on lines 46 to 50
@Override
public AttributeSet outputSet() {
return child().outputSet();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++, this is dangerous, we normally don't have to override outputSet but rely it to be constructed from output().

A IMHO better way that remains safe:

    public AttributeSet outputSet() {
        if (lazyOutputSet == null) {
            List<Attribute> output = output();
            lazyOutputSet = (output == child.output?) child.outputSet() : new AttributeSet(output);
        }
        return lazyOutputSet;


@Override
public AttributeSet outputSet() {
return child().outputSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a better default implementation of this in UnaryPlan, this might be obsolete. Same goes for the corresponding overrides in the other classes.

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!
LGTM, thanks @costin

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment on lines 68 to 69
return child().output();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope, but I noticed this is redundant.

@costin costin added the auto-backport Automatically create backport pull requests when merged label Mar 20, 2025
@costin costin enabled auto-merge (squash) March 20, 2025 00:48
@costin costin disabled auto-merge March 20, 2025 02:43
@costin costin merged commit e186b15 into elastic:main Mar 20, 2025
15 of 17 checks passed
@costin costin deleted the esql/reuse-outputset branch March 20, 2025 02:44
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.18
8.x
9.0

costin added a commit to costin/elasticsearch that referenced this pull request Mar 20, 2025
…24611)

Avoid creating outputSet between nodes that passthrough their input

Relates elastic#124395
costin added a commit to costin/elasticsearch that referenced this pull request Mar 20, 2025
…24611)

Avoid creating outputSet between nodes that passthrough their input

Relates elastic#124395
elasticsearchmachine pushed a commit that referenced this pull request Mar 20, 2025
…125275)

Avoid creating outputSet between nodes that passthrough their input

Relates #124395
elasticsearchmachine pushed a commit that referenced this pull request Mar 20, 2025
…125273)

Avoid creating outputSet between nodes that passthrough their input

Relates #124395
costin added a commit that referenced this pull request Mar 20, 2025
…125274)

Avoid creating outputSet between nodes that passthrough their input

Relates #124395
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.1 v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants