Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] HeapAttackIT testLookupExplosionNoFetch failing #123432

Closed
elasticsearchmachine opened this issue Feb 25, 2025 · 4 comments
Closed

[CI] HeapAttackIT testLookupExplosionNoFetch failing #123432

elasticsearchmachine opened this issue Feb 25, 2025 · 4 comments
Assignees
Labels
:Analytics/ES|QL AKA ESQL medium-risk An open issue or test failure that is a medium risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Feb 25, 2025

Build Scans:

Reproduction Line:

./gradlew ":test:external-modules:test-esql-heap-attack:javaRestTest" --tests "org.elasticsearch.xpack.esql.heap_attack.HeapAttackIT.testLookupExplosionNoFetch" -Dtests.seed=B6380225CF313F5E -Dtests.locale=en-KI -Dtests.timezone=America/Los_Angeles -Druntime.java=23

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

org.elasticsearch.client.ResponseException: method [POST], host [http://[::1]:34875], URI [/_query?error_trace=], status line [HTTP/1.1 429 Too Many Requests]
Warnings: [No limit defined, adding default limit of [1000]]
{"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[request] Data too large, data for [serialize lookup join response] would be [348652480/332.5mb], which is larger than the limit of [322122547/307.1mb]; for more information, see https://www.elastic.co/guide/en/elasticsearch/reference/master/circuit-breaker-errors.html","bytes_wanted":348652480,"bytes_limit":322122547,"durability":"TRANSIENT","stack_trace":"org.elasticsearch.common.breaker.CircuitBreakingException: [request] Data too large, data for [serialize lookup join response] would be [348652480/332.5mb], which is larger than the limit of [322122547/307.1mb]; for more information, see https://www.elastic.co/guide/en/elasticsearch/reference/master/circuit-breaker-errors.html\n\tat org.elasticsea
[truncated]

Issue Reasons:

  • [main] 2 failures in test testLookupExplosionNoFetch (0.3% fail rate in 611 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :Analytics/ES|QL AKA ESQL >test-failure Triaged test failures from CI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) needs:risk Requires assignment of a risk label (low, medium, blocker) labels Feb 25, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/es-analytical-engine (Team:Analytics)

@astefan
Copy link
Contributor

astefan commented Feb 26, 2025

This seems to be similar to #120433. Logs:

[2025-02-26T03:03:53,384][INFO ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1530] overhead, spent [472ms] collecting in the last [1s]
[2025-02-26T03:03:54,129][INFO ][o.e.x.e.h.HeapAttackIT   ] [testEnrichExplosionManyMatches] --> test testEnrichExplosionManyMatches triggering OOM after 5m
[2025-02-26T03:03:54,133][ERROR][o.e.t.e.h.RestTriggerOutOfMemoryAction] [test-cluster-1] triggering out of memory
[2025-02-26T03:03:54,813][WARN ][o.e.m.j.JvmGcMonitorService] [test-cluster-1] [gc][1531] overhead, spent [988ms] collecting in the last [1.4s]
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /opt/local-ssd/buildkite/builds/bk-agent-prod-gcp-1740520568618365794/elastic/elasticsearch-periodic-platform-support/test/external-modules/esql-heap-attack/build/testrun/javaRestTest/temp/test-cluster14331309953705456918/test-cluster-1/logs/java_pid366057.hprof ...
Heap dump file created [557855777 bytes in 1.507 secs]
Terminating due to java.lang.OutOfMemoryError: Java heap space
[2025-02-26T03:03:57,857][INFO ][o.e.t.ClusterConnectionManager] [test-cluster-0] transport connection to [{test-cluster-1}{0jp3EEfaSpejlK2jEnHoog}{iXsT6Fo2Sx2YAPtG-LF-kw}{test-cluster-1}{127.0.0.1}{127.0.0.1:38531}{cdfhilmrstw}{8.19.0}{7000099-8527000}] closed by remote; if unexpected, see [https://www.elastic.co/guide/en/elasticsearch/reference/master/troubleshooting-unstable-cluster.html#troubleshooting-unstable-cluster-network] for troubleshooting guidance
[2025-02-26T03:03:57,894][INFO ][o.e.c.c.Coordinator      ] [test-cluster-0] master node [{test-cluster-1}{0jp3EEfaSpejlK2jEnHoog}{iXsT6Fo2Sx2YAPtG-LF-kw}{test-cluster-1}{127.0.0.1}{127.0.0.1:38531}{cdfhilmrstw}{8.19.0}{7000099-8527000}] disconnected, restarting discovery [[test-cluster-1][127.0.0.1:38531][internal:coordination/fault_detection/leader_check] disconnected]
[2025-02-26T03:03:57,897][INFO ][o.e.c.s.ClusterApplierService] [test-cluster-0] master node changed {previous [{test-cluster-1}{0jp3EEfaSpejlK2jEnHoog}{iXsT6Fo2Sx2YAPtG-LF-kw}{test-cluster-1}{127.0.0.1}{127.0.0.1:38531}{cdfhilmrstw}{8.19.0}{7000099-8527000}], current []}, term: 1, version: 452, reason: becoming candidate: onLeaderFailure
[2025-02-26T03:03:57,935][ERROR][o.e.t.e.h.RestTriggerOutOfMemoryAction] [test-cluster-0] triggering out of memory
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /opt/local-ssd/buildkite/builds/bk-agent-prod-gcp-1740520568618365794/elastic/elasticsearch-periodic-platform-support/test/external-modules/esql-heap-attack/build/testrun/javaRestTest/temp/test-cluster14331309953705456918/test-cluster-0/logs/java_pid366103.hprof ...
[2025-02-26T03:03:59,332][WARN ][o.e.m.j.JvmGcMonitorService] [test-cluster-0] [gc][1535] overhead, spent [892ms] collecting in the last [1.5s]
[2025-02-26T03:03:57,932][WARN ][o.e.c.NodeConnectionsService] [test-cluster-0] failed to connect to {test-cluster-1}{0jp3EEfaSpejlK2jEnHoog}{iXsT6Fo2Sx2YAPtG-LF-kw}{test-cluster-1}{127.0.0.1}{127.0.0.1:38531}{cdfhilmrstw}{8.19.0}{7000099-8527000}{testattr=test, xpack.installed=true, transform.config_version=10.0.0, ml.allocated_processors_double=32.0, ml.machine_memory=100697944064, ml.config_version=12.0.0, ml.max_jvm_size=536870912, ml.allocated_processors=32} (tried [1] times) org.elasticsearch.transport.ConnectTransportException: [test-cluster-1][127.0.0.1:38531] connect_exception

@astefan astefan added medium-risk An open issue or test failure that is a medium risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Feb 26, 2025
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 2 failures in test testLookupExplosionNoFetch (0.3% fail rate in 611 executions)

Build Scans:

georgewallace pushed a commit to georgewallace/elasticsearch that referenced this issue Mar 11, 2025
@idegtiarenko
Copy link
Contributor

#124300 and corresponding 8.x backport should help to stabilize this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL medium-risk An open issue or test failure that is a medium risk to future releases Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

4 participants