Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] HeapAttackIT testEnrichExplosionManyMatches failing #122913

Closed
elasticsearchmachine opened this issue Feb 19, 2025 · 5 comments
Closed

[CI] HeapAttackIT testEnrichExplosionManyMatches failing #122913

elasticsearchmachine opened this issue Feb 19, 2025 · 5 comments
Assignees
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Feb 19, 2025

Build Scans:

Reproduction Line:

./gradlew ":test:external-modules:test-esql-heap-attack:javaRestTest" --tests "org.elasticsearch.xpack.esql.heap_attack.HeapAttackIT.testEnrichExplosionManyMatches" -Dtests.seed=25F269D7FF8C8AC1 -Dtests.configure_test_clusters_with_one_processor=true -Dtests.locale=cv-RU -Dtests.timezone=America/Cayenne -Druntime.java=23

Applicable branches:
8.x

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.net.ConnectException: Connection refused

Issue Reasons:

  • [8.x] 11 failures in test testEnrichExplosionManyMatches (1.6% fail rate in 683 executions)
  • [8.x] 3 failures in step single-processor-node-tests (15.8% fail rate in 19 executions)
  • [8.x] 2 failures in step concurrent-search-tests (11.1% fail rate in 18 executions)
  • [8.x] 4 failures in pipeline elasticsearch-periodic (20.0% fail rate in 20 executions)
  • [8.x] 6 failures in pipeline elasticsearch-periodic-platform-support (31.6% fail rate in 19 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Analytics/ES|QL AKA ESQL >test-failure Triaged test failures from CI labels Feb 19, 2025
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 3 failures in test testEnrichExplosionManyMatches (0.4% fail rate in 794 executions)

Build Scans:

@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) needs:risk Requires assignment of a risk label (low, medium, blocker) labels Feb 19, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-analytical-engine (Team:Analytics)

@ivancea
Copy link
Contributor

ivancea commented Feb 19, 2025

Some Connection refused and some giving up circuit breaking after 6 attempts

@alex-spies alex-spies added low-risk An open issue or test failure that is a low risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Mar 5, 2025
@alex-spies
Copy link
Contributor

I think this is the same as we've seen with the other HeapAttackIT failures:

[2025-03-03T08:52:18,582][WARN ][o.e.i.b.request          ] [test-cluster-0] [request] New used memory 322132599 [307.2mb] for data of [<reused_arrays>] would be larger than configured breaker: 322122547 [307.1mb], breaking
[2025-03-03T08:52:18,804][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-0] [gc][1421] overhead, spent [292ms] collecting in the last [1s]
[2025-03-03T08:54:07,445][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1530] overhead, spent [265ms] collecting in the last [1s]
[2025-03-03T08:54:10,450][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1533] overhead, spent [262ms] collecting in the last [1s]
[2025-03-03T08:54:12,460][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1535] overhead, spent [274ms] collecting in the last [1s]
[2025-03-03T08:54:13,469][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1536] overhead, spent [272ms] collecting in the last [1s]
[2025-03-03T08:54:14,472][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1537] overhead, spent [265ms] collecting in the last [1s]
[2025-03-03T08:54:15,479][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1538] overhead, spent [281ms] collecting in the last [1s]
[2025-03-03T08:54:16,486][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1539] overhead, spent [299ms] collecting in the last [1s]
[2025-03-03T08:54:17,505][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1540] overhead, spent [293ms] collecting in the last [1s]
[2025-03-03T08:54:18,522][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1541] overhead, spent [345ms] collecting in the last [1s]
[2025-03-03T08:54:19,529][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1542] overhead, spent [325ms] collecting in the last [1s]
[2025-03-03T08:54:20,532][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1543] overhead, spent [327ms] collecting in the last [1s]
[2025-03-03T08:54:21,539][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1544] overhead, spent [325ms] collecting in the last [1s]
[2025-03-03T08:54:22,544][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1545] overhead, spent [348ms] collecting in the last [1s]
[2025-03-03T08:54:23,548][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1546] overhead, spent [342ms] collecting in the last [1s]
[2025-03-03T08:54:24,553][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1547] overhead, spent [344ms] collecting in the last [1s]
[2025-03-03T08:54:25,555][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1548] overhead, spent [323ms] collecting in the last [1s]
[2025-03-03T08:54:26,558][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1549] overhead, spent [328ms] collecting in the last [1s]
[2025-03-03T08:54:27,560][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1550] overhead, spent [360ms] collecting in the last [1s]
[2025-03-03T08:54:28,564][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1551] overhead, spent [351ms] collecting in the last [1s]
[2025-03-03T08:54:29,567][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1552] overhead, spent [293ms] collecting in the last [1s]
[2025-03-03T08:54:30,601][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1553] overhead, spent [345ms] collecting in the last [1s]
[2025-03-03T08:54:31,604][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1554] overhead, spent [341ms] collecting in the last [1s]
[2025-03-03T08:54:32,610][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1555] overhead, spent [335ms] collecting in the last [1s]
[2025-03-03T08:54:33,615][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1556] overhead, spent [350ms] collecting in the last [1s]
[2025-03-03T08:54:34,487][INFO ][o.e.x.e.h.HeapAttackIT   ] [testEnrichExplosionManyMatches] --> test testEnrichExplosionManyMatches triggering OOM after 5m
[2025-03-03T08:54:34,495][ERROR][o.e.t.e.h.RestTriggerOutOfMemoryAction] [test-cluster-1] triggering out of memory

@idegtiarenko
Copy link
Contributor

#124300 and corresponding 8.x backport should help to stabilize this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL low-risk An open issue or test failure that is a low risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

5 participants