Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change downsample's MetricFieldProducers #124701

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Mar 13, 2025

Refactor MetricFieldProducer to use SortedNumericDoubleValues instead of FormattedDocValues, which saves unneeded conversations / casts.

Next step after this PR is to remove the Metric abstraction layer in MetricFieldProducer (a8a5735). According to local experiments, this is what gives a 10% performance boost.

…f FormattedDocValues, which saves unneeded number conversations and casts.
@martijnvg martijnvg added >non-issue :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data labels Mar 13, 2025
@martijnvg martijnvg added v8.18.1 v9.0.1 auto-backport Automatically create backport pull requests when merged labels Mar 13, 2025
@martijnvg martijnvg marked this pull request as ready for review March 13, 2025 09:45
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

nonMetricProducers.toArray(new AbstractDownsampleFieldProducer[0]),
formattedDocValues.toArray(new FormattedDocValues[0]),
metricProducers.toArray(new MetricFieldProducer[0]),
numericDocValues.toArray(new SortedNumericDoubleValues[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: just pass the lists, to avoid the conversion to arrays?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LeafDownsampleCollector#collect(...) method is hot code. I prefer to keep using arrays there.

for (int i = 0; i < fieldProducers.length; i++) {
AbstractDownsampleFieldProducer fieldProducer = fieldProducers[i];
for (int i = 0; i < nonMetricProducers.length; i++) {
AbstractDownsampleFieldProducer fieldProducer = nonMetricProducers[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assert that the lengths are the same?

@@ -281,9 +287,7 @@ public void write(XContentBuilder builder) throws IOException {
if (isEmpty() == false) {
builder.startObject(name());
for (MetricFieldProducer.Metric metric : metrics()) {
if (metric.get() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if it returns Double.MIN_VALUE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I removed this, after I realize that a metric (min, max, sum, count) will always have a value.
What the default value is depends on the metric implementation, so we would need to delegate to Metric for this. Given that this isn't needed, let's keep it this way? Also Metric will be removed in a followup PR.

@@ -94,51 +100,51 @@ public String name() {
* Metric implementation that computes the maximum of all values of a field
*/
static final class Max extends Metric {
private Double max;
private double max = -Double.MAX_VALUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a static constant, Max.INVALID_VALUE. Same below.

@@ -39,9 +39,7 @@ public void write(XContentBuilder builder) throws IOException {
if (fieldProducer.isEmpty() == false) {
if (fieldProducer instanceof MetricFieldProducer metricFieldProducer) {
for (MetricFieldProducer.Metric metric : metricFieldProducer.metrics()) {
if (metric.get() != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if the returned value is invalid?

@martijnvg martijnvg changed the title Improve downsample's MetricFieldProducers Change downsample's MetricFieldProducers Mar 13, 2025
@martijnvg martijnvg enabled auto-merge (squash) March 13, 2025 10:55
@martijnvg martijnvg merged commit 81f33e4 into elastic:main Mar 13, 2025
17 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Mar 13, 2025
Refactor MetricFieldProducer to use SortedNumericDoubleValues instead of FormattedDocValues, which saves unneeded conversations / casts.
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Mar 13, 2025
Refactor MetricFieldProducer to use SortedNumericDoubleValues instead of FormattedDocValues, which saves unneeded conversations / casts.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.18
9.0

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Mar 13, 2025
Refactor MetricFieldProducer to use SortedNumericDoubleValues instead of FormattedDocValues, which saves unneeded conversations / casts.
elasticsearchmachine pushed a commit that referenced this pull request Mar 13, 2025
Refactor MetricFieldProducer to use SortedNumericDoubleValues instead of FormattedDocValues, which saves unneeded conversations / casts.
elasticsearchmachine pushed a commit that referenced this pull request Mar 13, 2025
Refactor MetricFieldProducer to use SortedNumericDoubleValues instead of FormattedDocValues, which saves unneeded conversations / casts.
martijnvg added a commit that referenced this pull request Mar 13, 2025
Backporting #124701 to 9.0 branch.

Refactor MetricFieldProducer to use SortedNumericDoubleValues instead of FormattedDocValues, which saves unneeded conversations / casts.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request Mar 13, 2025
Refactor MetricFieldProducer to use SortedNumericDoubleValues instead of FormattedDocValues, which saves unneeded conversations / casts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >non-issue :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine v8.18.1 v8.19.0 v9.0.1 v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants