-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add core coordination algorithm for cluster state publishing #32171
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be useful to have javadoc comments on the classes introduced here to give a bit of intuition for what they mean. Particularly Join
which is subtly different from simply attempting to join a cluster, and CoordinationStateException
which is a lot less scary (and more common) than it looks.
I did some manual fuzzing of the logic and found a handful of cases that aren't covered by tests - see inline comments.
|
||
import java.io.IOException; | ||
|
||
public class ApplyCommit extends TermVersionRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Classes derived from TransportRequest
seem to be named with a ...Request
suffix. ApplyCommitRequest
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, fixed in 34567f6
|
||
public class CoordinationStateTests extends ESTestCase { | ||
|
||
DiscoveryNode node1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields can all be private
. Additionally, initialStateNode{2,3}
and cs3
are only used in setupNodes()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 6611e58 (cs3 is now used)
import static org.hamcrest.Matchers.containsString; | ||
import static org.hamcrest.Matchers.equalTo; | ||
|
||
public class CoordinationStateTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a test that runs the state machines in parallel, passing messages back and forth, and asserting that the committed states remain consistent. This might want to wait until the PR on which I'm working which introduces a deterministic task queue, so maybe just a //TODO
for now.
* @throws CoordinationStateRejectedException if the arguments were incompatible with the current state of this object. | ||
*/ | ||
public Join handleStartJoin(StartJoinRequest startJoinRequest) { | ||
if (startJoinRequest.getTerm() <= getCurrentTerm()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests all pass if this is <
instead of <=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run the tests multiple times? Both testStartJoinBeforeBootstrap and testStartJoinAfterBootstrap fail approx. 30% of the times if this is changed from <=
to <
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a couple of times, not enough clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed 3bec606 🗡
logger.debug("handleJoin: added join {} from [{}] for election, electionWon={} lastAcceptedTerm={} lastAcceptedVersion={}", join, | ||
join.getSourceNode(), electionWon, lastAcceptedTerm, getLastAcceptedVersion()); | ||
|
||
if (electionWon && prevElectionWon == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests all pass with if (electionWon) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 5119256 (which also found another small copy-paste error)
"incoming term " + join.getTerm() + " does not match current term " + getCurrentTerm()); | ||
} | ||
|
||
if (startedJoinSinceLastReboot == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass with if (false) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 8b7af72
} | ||
|
||
if (clusterState.getLastAcceptedConfiguration().equals(getLastAcceptedConfiguration()) == false | ||
&& getLastCommittedConfiguration().equals(getLastAcceptedConfiguration()) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass with if (getLastCommittedConfiguration().equals(getLastAcceptedConfiguration()) == false) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 708ae8f
throw new CoordinationStateRejectedException("incoming term " + clusterState.term() + " does not match current term " + | ||
getCurrentTerm()); | ||
} | ||
if (clusterState.term() == getLastAcceptedTerm() && clusterState.version() <= getLastAcceptedVersion()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass with:
if (clusterState.version() <= getLastAcceptedVersion()) {
if (clusterState.version() < getLastAcceptedVersion()) {
if (clusterState.term() == getLastAcceptedTerm()) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 0d84428
} | ||
|
||
public boolean isElectionQuorum(VoteCollection votes) { | ||
return votes.isQuorum(getLastCommittedConfiguration()) && votes.isQuorum(getLastAcceptedConfiguration()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass with
return votes.isQuorum(getLastCommittedConfiguration());
return votes.isQuorum(getLastAcceptedConfiguration());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in ab02587
} | ||
|
||
public boolean isPublishQuorum(VoteCollection votes) { | ||
return votes.isQuorum(getLastCommittedConfiguration()) && votes.isQuorum(lastPublishedConfiguration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass with
return votes.isQuorum(getLastCommittedConfiguration());
return votes.isQuorum(lastPublishedConfiguration);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 8ee68b9
…eptedVersion() > getLastAcceptedVersion()
…dConfiguration()) == false && getLastCommittedConfiguration().equals(getLastAcceptedConfiguration()) == false
…sion() <= getLastAcceptedVersion())x
@DaveCTurner Thanks for the great review. I've run a code coverage tool on this which found one more missing test case, fixed in 9b2ad94. I have addressed all but one comment (the randomized testing), but I think this is ready for another review. I will check in the meanwhile how tricky it is to add these tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, LGTM. I found a nit in a javadoc comment. Let's not add the extra tests in this PR.
/** | ||
* Triggered by a {@link StartJoinRequest}, instances of this class represent join votes, | ||
* and have a source and target node. The source node is the node that provides the vote, | ||
* and the target node is the node for which this vote is casted. A node will only cast |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: casted
should read cast
.
Moving code around and renaming things. Goes hand-in-hand with a PR in the Elasticsearch repository that introduces the corresponding Java classes: elastic/elasticsearch#32171
Simulates a random run of a cluster with multiple CoordinationState instances (each representing one node), passing messages back and forth, and asserting that the overall system satisfies a given set of safety properties. Follow-up to #32171
Adds the core coordination algorithm that guarantees safety for cluster state publishing. This is a straight-forward port of the transition rules from the [formal model](url
https://github.com/elastic/elasticsearch-formal-models/blob/master/ZenWithTerms/tla/ZenWithTerms.tla) and goes hand-in-hand with the PR elastic/elasticsearch-formal-models#37.
Relates to #32006