-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlobStoreRepository#readOnly doesn't update with repo metadata changes #93575
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
It should work to set the readonly flag while operations on the repository are ongoing as long as this flag blocks updates to the root blob. In particular that means we can only make an active repository readonly if its This might lead to some surprising outcomes for users expecting In any case we need to make it possible for users to know when the repository has been safely made readonly (i.e. only ack the request after |
Because of elastic#93575 it's not sufficient to mark repositories with `readonly: true` while taking a backup. The only safe way to avoid writes is to completely unregister them.
Because of #93575 it's not sufficient to mark repositories with `readonly: true` while taking a backup. The only safe way to avoid writes is to completely unregister them.
Because of elastic#93575 it's not sufficient to mark repositories with `readonly: true` while taking a backup. The only safe way to avoid writes is to completely unregister them.
Because of elastic#93575 it's not sufficient to mark repositories with `readonly: true` while taking a backup. The only safe way to avoid writes is to completely unregister them.
Because of elastic#93575 it's not sufficient to mark repositories with `readonly: true` while taking a backup. The only safe way to avoid writes is to completely unregister them.
Because of elastic#93575 it's not sufficient to mark repositories with `readonly: true` while taking a backup. The only safe way to avoid writes is to completely unregister them.
Because of elastic#93575 it's not sufficient to mark repositories with `readonly: true` while taking a backup. The only safe way to avoid writes is to completely unregister them.
Because of elastic#93575 it's not sufficient to mark repositories with `readonly: true` while taking a backup. The only safe way to avoid writes is to completely unregister them.
Because of elastic#93575 it's not sufficient to mark repositories with `readonly: true` while taking a backup. The only safe way to avoid writes is to completely unregister them.
Instead of actively aborting all operations, is it viable to simply fail the read-only request if there is any ongoing repo action similar to how we fail repo deletion if the repo is in use? It may not be the best user experience but is probably something much simpler to implement? |
We could reject a make-read-only request if I don't think we should reject make-read-only requests if the repository is busy but |
Personally I feel failing ongoing actitivies when flip the read-only flag is probably going to lead support cases and/or enhancement requests.
This depends on the above discussion about whether we want to fail ongoing activities. Assuming we don't, I think it is not unreasonable to ask users to stop ILM and SLM first. I'd think there is also benefit for users to stop them proactively so that they control the exact state on stop. |
Those support cases will be of the form "I marked my repository read-only and now some things are complaining that the repository is read-only" which are going to be pretty easily diagnosed and resolved: ILM and SLM will just keep retrying until the repository becomes writeable again. The cases I want to avoid are of the form "I stopped ILM and now this one critical index didn't roll over when it should have so it's now ten times the expected size and is rejecting indexing because it hit the max doc count" which are way worse. Neither answer is particularly good, we really need more ERs so that there is some indication we should address #54944 to get a satisfactory solution to the question of repository backups. But until that happens, the lesser of the two evils seems to be to fail ongoing activities rather than to require their absence. |
Also in most cases there will be a gap in activities reasonably frequently, so even if we require no ongoing activities when marking the repository read-only we're still not protecting against the cases of the form "I marked my repository read-only and now some things are complaining that the repository is read-only". To do that more completely we'd have to insist on ILM and SLM being stopped while the repository is read-only, but this would have far too broad an impact to be acceptable. |
Your comment in this parallel thread prompted me to check the exact behaviour in this area today. I thought we already accepted a change to TBC... |
... yeah ok this looks super-sketchy but not obviously broken (yet). The way that we don't look up the repository out of I think we should check the readonly flag in |
To clarify, I mean that this is so implicit in the code today that it looks like it'd be easy to accidentally introduce a bug that allows writes into a readonly repository with some minor threading tweak or other simple change. |
Users may update the repository metadata without re-creating the repository and in particular they may adjust the
readonly
setting this way. HoweverBlobStoreRepository#readOnly
isfinal
, read from the repository metadata at creation time, so the repository may not actually reject writes after a user has set the readonly mode.This is kinda tricky to fix because there may be ongoing writes at the time of the metadata update and today we have no mechanism to fail them - indeed I see some assertions that the
readonly
flag remainsfalse
throughout the snapshotting process.Workaround
To be sure a repository is truly read-only, completely unregister it from Elasticsearch and then re-register it with the
readonly: true
flag.I expect it should also work to set the
readonly: true
flag and then perform a rolling restart of the cluster.The text was updated successfully, but these errors were encountered: