-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransportGetAllocationStatsAction may cause significant load on elected master #110716
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
Do we need client caching too? Server caching should help when there are few nodes with large number of shards. But if there many nodes that frequently poll stats, networking overhead will cripple up. May be distribute load, client picks random node from cluster and if that node does not have it in cache, forward to master and populate cache. |
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Please let me know if this issue is open. If it is can you please assign it to me. |
Still available @shreedaddy thanks for the offer. We can't assign issues to folks outside the @elastic org but if you want to contribute a PR then please feel free. |
Will get started. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Adds a new setting TransportGetAllocationStatsAction.CACHE_MAX_AGE_SETTING to configure the max age for cached AllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
Adds a new setting TransportGetAllocationStatsAction.CACHE_MAX_AGE_SETTING to configure the max age for cached AllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
Adds a new setting TransportGetAllocationStatsAction.CACHE_MAX_AGE_SETTING to configure the max age for cached AllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
Adds a new setting TransportGetAllocationStatsAction.CACHE_MAX_AGE_SETTING to configure the max age for cached AllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
Adds a new setting TransportGetAllocationStatsAction.CACHE_MAX_AGE_SETTING to configure the max age for cached AllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
Adds a new cache and setting TransportGetAllocationStatsAction.CACHE_TTL_SETTING "cluster.routing.allocation.stats.cache.ttl" to configure the max age for cached NodeAllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
Adds a new cache and setting TransportGetAllocationStatsAction.CACHE_TTL_SETTING "cluster.routing.allocation.stats.cache.ttl" to configure the max age for cached NodeAllocationStats on the master. The default value is currently 1 minute per the suggestion in issue 110716. Closes elastic#110716
Elasticsearch Version
8.14
Installed Plugins
No response
Java Version
bundled
OS Version
any
Problem Description
TransportGetAllocationStatsAction runs on elected master so it is theoretically possible to overload it by executing node stats requests around various nodes in cluster, especially in a clusters with many shards as complexity is proportional to the shard count.
The aggregated result of the computation is small (5 numbers per node), we should consider caching it for small period of time (1 minute?) and reuse it between different calls during.
elasticsearch/server/src/main/java/org/elasticsearch/action/admin/cluster/node/stats/TransportNodesStatsAction.java
Lines 88 to 96 in 5e52059
Steps to Reproduce
n/a
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: