Skip to content

Conversation

@fabianlindfors
Copy link
Contributor

All Submissions:

  • Have you signed our CLA?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

When Kubernetes shuts down a pod it will send a SIGTERM signal to each container and give them 30s to gracefully shutdown. Currently, our mongod won't get the SIGTERM signal, causing the server to be forcefully shut down and possibly causing a primary loss. The larger underlying problem is that mongod is not the main process (PID 1) of its container but started from inside a bash script. The reason we've designed it this way is because of our pre-stop hook, used to handle version changes.

This PR switches from a pre-stop hook to a post-start hook, run before mongod is started in the container. The majority of the changes in this PR are to switch the name of the pre-stop hook and the ones that enable graceful shutdowns are quite simple. We now use exec in the bash script, causing mongod to replace the bash process and get PID 1. This in turn enables graceful shutdowns as any SIGTERM signals will reach the server.

We've had some flaky tests which could be related to this issue. This PR probably won't fix those issues as many tests run on MongoDB 4.0.6, whilst support for SIGTERM didn't arrive until 4.0.8. We should create a new PR to upgrade our tests to a version where the graceful shutdowns can be handled properly.

# exit(0) for Kubernetes to restart the container.
/hooks/pre-stop-hook ;
# start mongod with this configuration
exec mongod -f /data/automation-mongod.conf ;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exec causes mongod to become PID 1 and in turn enables graceful shutdowns.

# run post-start hook to handle version changes.
# during a version change, mongod will be stopped and the
# container will be restarted, at which points this hook runs.
/hooks/version-upgrade-hook
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have used the container lifecycle post-start hook instead of including the hook in our script. This would cause the hook to run concurrently with mongod, resulting in the server starting up one extra time before being killed again.

We opted to trigger the post-start hook from inside the bash script to avoid this extra restart.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably just name it without hook since it's in the hooks directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, renamed!

const (
agentStatusFilePathEnv = "AGENT_STATUS_FILEPATH"
logFilePathEnv = "PRE_STOP_HOOK_LOG_PATH"
logFilePathEnv = "VERSION_UPGRADE_HOOK_LOG_PATH"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: instead of writing the log to a file we could simply write it to stdout and let Kubernetes pick up on the logs. This would make debugging easier as the log currently is unavailable when mongod is stopped and the log file will also be removed during a restart.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this was valid for enterprise but i thought all logs were in stdout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Changed now to write directly to stdout

const (
agentStatusFilePathEnv = "AGENT_STATUS_FILEPATH"
logFilePathEnv = "PRE_STOP_HOOK_LOG_PATH"
logFilePathEnv = "VERSION_UPGRADE_HOOK_LOG_PATH"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this was valid for enterprise but i thought all logs were in stdout

Comment on lines 544 to 546
# run post-start hook to handle version changes.
# during a version change, mongod will be stopped and the
# container will be restarted, at which points this hook runs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stick to 1 line comment in the startup script and we have the details in the docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Updated the comment and architecture.md.

# run post-start hook to handle version changes.
# during a version change, mongod will be stopped and the
# container will be restarted, at which points this hook runs.
/hooks/version-upgrade-hook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

# run post-start hook to handle version changes.
# during a version change, mongod will be stopped and the
# container will be restarted, at which points this hook runs.
/hooks/version-upgrade-hook
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably just name it without hook since it's in the hooks directory

Copy link
Contributor

@LouisPlisso LouisPlisso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@chatton chatton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for cleaning this up. One small nit about a comment that is no incorrect, but otherwise this LGTM 👍

@fabianlindfors fabianlindfors merged commit ccc8c53 into master Jul 16, 2020
@fabianlindfors fabianlindfors deleted the change_pre-stop_hook_name branch July 16, 2020 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants