Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] MlWithSecurityIT test {yaml=ml/3rd_party_deployment/Test start deployment fails while model download in progress} failing #120814

Open
elasticsearchmachine opened this issue Jan 24, 2025 · 4 comments
Labels
medium-risk An open issue or test failure that is a medium risk to future releases :ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI

Comments

@elasticsearchmachine
Copy link
Collaborator

elasticsearchmachine commented Jan 24, 2025

Build Scans:

Reproduction Line:

gradlew ":x-pack:plugin:ml:qa:ml-with-security:yamlRestTest" --tests "org.elasticsearch.smoketest.MlWithSecurityIT.test {yaml=ml/3rd_party_deployment/Test start deployment fails while model download in progress}" -Dtests.seed=DCCBFB74A3C4D289 -Dtests.locale=tk-TM -Dtests.timezone=America/Guadeloupe -Druntime.java=23

Applicable branches:
main

Reproduces locally?:
N/A

Failure History:
See dashboard

Failure Message:

java.lang.AssertionError: Failure at [ml/3rd_party_deployment:221]: expected [2xx] status code but api [ml.put_trained_model] returned [400 Bad Request] [{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Failed to load package metadata","stack_trace":"org.elasticsearch.ElasticsearchException$1: Failed to load package metadata\r\n\tat org.elasticsearch.server@9.1.0-SNAPSHOT/org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:725)\r\n\tat org.elasticsearch.server@9.1.0-SNAPSHOT/org.elasticsearch.ElasticsearchException.generateFailureXContent(ElasticsearchException.java:639)\r\n\tat org.elasticsearch.server@9.1.0-SNAPSHOT/org.elasticsearch.rest.RestResponse.build(RestResponse.java:202)\r\n\tat org.elasticsearch.server@9.1.0-SNAPSHOT/org.elasticsearch.rest.RestResponse.<init>(RestResponse.java:161)\r\n\tat org.elasticsearch.server@9.1.0-SNAPSHOT/org.elasticsearch.rest.RestResponse.<init>(RestResponse.java:122)\r\n\tat org.elasticsearch.
[truncated]

Issue Reasons:

  • [main] 3 failures in test test {yaml=ml/3rd_party_deployment/Test start deployment fails while model download in progress} (0.3% fail rate in 999 executions)
  • [main] 2 failures in pipeline elasticsearch-pull-request (0.5% fail rate in 410 executions)

Note:
This issue was created using new test triage automation. Please report issues or feedback to es-delivery.

@elasticsearchmachine elasticsearchmachine added :ml Machine learning >test-failure Triaged test failures from CI Team:ML Meta label for the ML team needs:risk Requires assignment of a risk label (low, medium, blocker) labels Jan 24, 2025
@elasticsearchmachine
Copy link
Collaborator Author

Pinging @elastic/ml-core (Team:ML)

@davidkyle davidkyle added medium-risk An open issue or test failure that is a medium risk to future releases and removed needs:risk Requires assignment of a risk label (low, medium, blocker) labels Feb 12, 2025
@davidkyle
Copy link
Member

The test failed because the ml feature reset failed

[2025-01-24T16:08:59,350][ERROR][o.e.x.m.MachineLearning  ] [yamlRestTest-0] failed to reset machine learning
org.elasticsearch.action.FailedNodeException: Failed node [yA0vy4BNTgalVxF6VeOxlg]
	....

Caused by: org.elasticsearch.transport.RemoteTransportException: [yamlRestTest-0][127.0.0.1:36665][cluster:monitor/tasks/lists[n]]
Caused by: org.elasticsearch.ElasticsearchTimeoutException: timed out after [30s/30000ms]
	at org.elasticsearch.action.support.SubscribableListener.lambda$scheduleTimeout$5(SubscribableListener.java:556) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:977) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at org.elasticsearch.threadpool.ThreadPool$1.run(ThreadPool.java:514) ~[elasticsearch-8.18.0-SNAPSHOT.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]

@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch 8.18

Mute Reasons:

  • [8.18] 2 consecutive failures in step openjdk17_checkpart4_java-fips-matrix
  • [8.18] 3 failures in test test {yaml=ml/3rd_party_deployment/Test start deployment fails while model download in progress} (2.3% fail rate in 130 executions)
  • [8.18] 2 failures in step openjdk17_checkpart4_java-fips-matrix (66.7% fail rate in 3 executions)
  • [8.18] 2 failures in pipeline elasticsearch-periodic (50.0% fail rate in 4 executions)

Build Scans:

elasticsearchmachine added a commit that referenced this issue Feb 26, 2025
…arty_deployment/Test start deployment fails while model download in progress} #120814
@elasticsearchmachine
Copy link
Collaborator Author

This has been muted on branch main

Mute Reasons:

  • [main] 3 failures in test test {yaml=ml/3rd_party_deployment/Test start deployment fails while model download in progress} (0.3% fail rate in 999 executions)
  • [main] 2 failures in pipeline elasticsearch-pull-request (0.5% fail rate in 410 executions)

Build Scans:

elasticsearchmachine added a commit that referenced this issue Mar 4, 2025
…arty_deployment/Test start deployment fails while model download in progress} #120814
dnhatn pushed a commit to dnhatn/elasticsearch that referenced this issue Mar 5, 2025
…arty_deployment/Test start deployment fails while model download in progress} elastic#120814
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this issue Mar 11, 2025
…arty_deployment/Test start deployment fails while model download in progress} elastic#120814
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium-risk An open issue or test failure that is a medium risk to future releases :ml Machine learning Team:ML Meta label for the ML team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

2 participants