-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add to allocation architecture guide #125328
base: main
Are you sure you want to change the base?
Add to allocation architecture guide #125328
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
CC @JeremyDahlgren just fyi. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Info is all good but I think would be better placed in Javadocs, either duplicated here or else with pointers from here to the Javadocs.
### Indexes and Shards | ||
|
||
Each index consists of a fixed number of primary shards. The number of primary shards cannot be changed for the lifetime of the index. Each | ||
primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed in runtime. Each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed in runtime. Each | |
primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed dynamically. Each |
(or maybe "at runtime")?
|
||
Each index consists of a fixed number of primary shards. The number of primary shards cannot be changed for the lifetime of the index. Each | ||
primary shard can have zero-to-many replicas used for data redundancy. The number of replicas per shard can be changed in runtime. Each | ||
shard copy (primary or replica) can be in one of four states: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth mentioning here that these states are org.elasticsearch.cluster.routing.ShardRoutingState
the ones in the routing table (i.e. within the ClusterState
) and the transitions between these states are part of the dance between data node and master node to reflect the lifecycle of a shard:
UNASSIGNED
-> INITIALIZING
happens when the master wants the data node to start creating this shard copy.
INITIALIZING
-> STARTED
happens when recovery is fully complete and the data node tells the master it's ready to serve requests.
STARTED
-> RELOCATING
happens when the master wants to initialize the node elsewhere.
A failure can take a shard in any state back to UNASSIGNED
. Or the shard entry can be removed entirely from the cluster state. In either case, that tells the data node to stop whatever it is doing and shut down the shard.
Also IMO it'd be more discoverable to expand the Javadocs for ShardRoutingState
with all this detail rather than hiding it away here.
updated `RoutingTable`. The `RoutingTable` is part of the cluster state, so the master node updates the cluster state with the new | ||
(incremental) desired shard allocation information. The updated cluster state is then published to the data nodes. Each data node will | ||
observe any change in shard allocation related to that node and take action to achieve the new shard allocation by initiating creation of a | ||
new empty shard, starting recovery (copying) of an existing shard from another data node, or remove a shard. When the data node finishes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grammar nit:
new empty shard, starting recovery (copying) of an existing shard from another data node, or remove a shard. When the data node finishes | |
new empty shard, starting recovery (copying) of an existing shard from another data node, or removing a shard. When the data node finishes |
but also I feel we should expand a bit on what "removing a shard" means in this context? It means actively shutting down the corresponding running (or recovering) IndexShard
instance, releasing all its resources.
Again, this feels like detail that should be in a Javadoc somewhere, maybe org.elasticsearch.cluster.routing.GlobalRoutingTable
, and linked to from RoutingTable
, IndexRoutingTable
, IndexShardRoutingTable
and ShardRouting
.
Adds discussion of index shards and their states, as well as
the communication flow between the master node and data
nodes for shard allocation changes.
Relates ES-7874
The first section is an attempt to move some of the allocation brain dump google document into the architecture guide.