-
Notifications
You must be signed in to change notification settings - Fork 522
CLOUDP-66661: Enable graceful shutdowns #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # exit(0) for Kubernetes to restart the container. | ||
| /hooks/pre-stop-hook ; | ||
| # start mongod with this configuration | ||
| exec mongod -f /data/automation-mongod.conf ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exec causes mongod to become PID 1 and in turn enables graceful shutdowns.
| # run post-start hook to handle version changes. | ||
| # during a version change, mongod will be stopped and the | ||
| # container will be restarted, at which points this hook runs. | ||
| /hooks/version-upgrade-hook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have used the container lifecycle post-start hook instead of including the hook in our script. This would cause the hook to run concurrently with mongod, resulting in the server starting up one extra time before being killed again.
We opted to trigger the post-start hook from inside the bash script to avoid this extra restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably just name it without hook since it's in the hooks directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, renamed!
cmd/versionhook/main.go
Outdated
| const ( | ||
| agentStatusFilePathEnv = "AGENT_STATUS_FILEPATH" | ||
| logFilePathEnv = "PRE_STOP_HOOK_LOG_PATH" | ||
| logFilePathEnv = "VERSION_UPGRADE_HOOK_LOG_PATH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: instead of writing the log to a file we could simply write it to stdout and let Kubernetes pick up on the logs. This would make debugging easier as the log currently is unavailable when mongod is stopped and the log file will also be removed during a restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this was valid for enterprise but i thought all logs were in stdout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Changed now to write directly to stdout
cmd/versionhook/main.go
Outdated
| const ( | ||
| agentStatusFilePathEnv = "AGENT_STATUS_FILEPATH" | ||
| logFilePathEnv = "PRE_STOP_HOOK_LOG_PATH" | ||
| logFilePathEnv = "VERSION_UPGRADE_HOOK_LOG_PATH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this was valid for enterprise but i thought all logs were in stdout
| # run post-start hook to handle version changes. | ||
| # during a version change, mongod will be stopped and the | ||
| # container will be restarted, at which points this hook runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should stick to 1 line comment in the startup script and we have the details in the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Updated the comment and architecture.md.
| # run post-start hook to handle version changes. | ||
| # during a version change, mongod will be stopped and the | ||
| # container will be restarted, at which points this hook runs. | ||
| /hooks/version-upgrade-hook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
| # run post-start hook to handle version changes. | ||
| # during a version change, mongod will be stopped and the | ||
| # container will be restarted, at which points this hook runs. | ||
| /hooks/version-upgrade-hook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably just name it without hook since it's in the hooks directory
LouisPlisso
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
chatton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for cleaning this up. One small nit about a comment that is no incorrect, but otherwise this LGTM 👍
All Submissions:
When Kubernetes shuts down a pod it will send a SIGTERM signal to each container and give them 30s to gracefully shutdown. Currently, our
mongodwon't get the SIGTERM signal, causing the server to be forcefully shut down and possibly causing a primary loss. The larger underlying problem is thatmongodis not the main process (PID 1) of its container but started from inside a bash script. The reason we've designed it this way is because of our pre-stop hook, used to handle version changes.This PR switches from a pre-stop hook to a post-start hook, run before
mongodis started in the container. The majority of the changes in this PR are to switch the name of the pre-stop hook and the ones that enable graceful shutdowns are quite simple. We now useexecin the bash script, causingmongodto replace the bash process and get PID 1. This in turn enables graceful shutdowns as any SIGTERM signals will reach the server.We've had some flaky tests which could be related to this issue. This PR probably won't fix those issues as many tests run on MongoDB 4.0.6, whilst support for SIGTERM didn't arrive until 4.0.8. We should create a new PR to upgrade our tests to a version where the graceful shutdowns can be handled properly.