Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlobStoreRepository#readOnly doesn't update with repo metadata changes #93575

Open
DaveCTurner opened this issue Feb 7, 2023 · 10 comments
Open
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@DaveCTurner
Copy link
Contributor

Users may update the repository metadata without re-creating the repository and in particular they may adjust the readonly setting this way. However BlobStoreRepository#readOnly is final, read from the repository metadata at creation time, so the repository may not actually reject writes after a user has set the readonly mode.

This is kinda tricky to fix because there may be ongoing writes at the time of the metadata update and today we have no mechanism to fail them - indeed I see some assertions that the readonly flag remains false throughout the snapshotting process.


Workaround

To be sure a repository is truly read-only, completely unregister it from Elasticsearch and then re-register it with the readonly: true flag.

I expect it should also work to set the readonly: true flag and then perform a rolling restart of the cluster.

@DaveCTurner DaveCTurner added >bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Feb 7, 2023
@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 7, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@DaveCTurner
Copy link
Contributor Author

DaveCTurner commented Sep 30, 2023

It should work to set the readonly flag while operations on the repository are ongoing as long as this flag blocks updates to the root blob. In particular that means we can only make an active repository readonly if its generation and pendingGeneration are equal, and conversely we must check the readonly flag when incrementing pendingGeneration. It would certainly be nice for other write operations to abort promptly if they see the repository become readonly, but that's not required for safety. In particular, after making a repository readonly like this, it will be safe to take a non-atomic backup of its contents even if some stray writes and deletes are still in progress, because those operations will not affect any blobs referenced from the current root blob.

This might lead to some surprising outcomes for users expecting readonly to mean we have stopped writing to the repository entirely. I think we cannot completely solve that (the active node may be disconnected from the cluster) but we could make at least some effort to wait for all known operations to finish before acknowledging the user's request to make the repository readonly.

In any case we need to make it possible for users to know when the repository has been safely made readonly (i.e. only ack the request after generation and pendingGeneration became equal and the readonly flag was set) since they may want to use this feature to make a different cluster the (unique) writer to a repository.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 18, 2024
Because of elastic#93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
DaveCTurner added a commit that referenced this issue Oct 18, 2024
Because of #93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 18, 2024
Because of elastic#93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 18, 2024
Because of elastic#93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 18, 2024
Because of elastic#93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this issue Oct 18, 2024
Because of elastic#93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
elasticsearchmachine pushed a commit that referenced this issue Oct 18, 2024
Because of #93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
elasticsearchmachine pushed a commit that referenced this issue Oct 18, 2024
Because of #93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
elasticsearchmachine pushed a commit that referenced this issue Oct 18, 2024
Because of #93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
lkts pushed a commit to lkts/elasticsearch that referenced this issue Oct 18, 2024
Because of elastic#93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
georgewallace pushed a commit to georgewallace/elasticsearch that referenced this issue Oct 25, 2024
Because of elastic#93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
elasticsearchmachine pushed a commit that referenced this issue Oct 28, 2024
Because of #93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
jfreden pushed a commit to jfreden/elasticsearch that referenced this issue Nov 4, 2024
Because of elastic#93575 it's not sufficient to mark repositories with
`readonly: true` while taking a backup. The only safe way to avoid
writes is to completely unregister them.
@ywangd
Copy link
Member

ywangd commented Mar 21, 2025

Instead of actively aborting all operations, is it viable to simply fail the read-only request if there is any ongoing repo action similar to how we fail repo deletion if the repo is in use? It may not be the best user experience but is probably something much simpler to implement?

@DaveCTurner
Copy link
Contributor Author

We could reject a make-read-only request if generation != pendingGeneration but it seems fairly hostile and likely to lead to support cases. This is not a state that is particularly visible to users and typically only lasts a very short time, so all they can do is retry until it succeeds.

I don't think we should reject make-read-only requests if the repository is busy but generation == pendingGeneration. The repository may never stop being busy thanks to ILM and SLM, and we don't want folks to have to futz around with stopping ILM and SLM as part of their repository backup process. It'd be nice to wait (after marking it as readonly) for these things to finish, but not required.

@ywangd
Copy link
Member

ywangd commented Mar 23, 2025

It'd be nice to wait (after marking it as readonly) for these things to finish, but not required.

Personally I feel failing ongoing actitivies when flip the read-only flag is probably going to lead support cases and/or enhancement requests.

The repository may never stop being busy thanks to ILM and SLM, and we don't want folks to have to futz around with stopping ILM and SLM as part of their repository backup proces

This depends on the above discussion about whether we want to fail ongoing activities. Assuming we don't, I think it is not unreasonable to ask users to stop ILM and SLM first. I'd think there is also benefit for users to stop them proactively so that they control the exact state on stop.

@DaveCTurner
Copy link
Contributor Author

failing ongoing actitivies when flip the read-only flag is probably going to lead support cases and/or enhancement requests.

Those support cases will be of the form "I marked my repository read-only and now some things are complaining that the repository is read-only" which are going to be pretty easily diagnosed and resolved: ILM and SLM will just keep retrying until the repository becomes writeable again. The cases I want to avoid are of the form "I stopped ILM and now this one critical index didn't roll over when it should have so it's now ten times the expected size and is rejecting indexing because it hit the max doc count" which are way worse.

Neither answer is particularly good, we really need more ERs so that there is some indication we should address #54944 to get a satisfactory solution to the question of repository backups. But until that happens, the lesser of the two evils seems to be to fail ongoing activities rather than to require their absence.

@DaveCTurner
Copy link
Contributor Author

Also in most cases there will be a gap in activities reasonably frequently, so even if we require no ongoing activities when marking the repository read-only we're still not protecting against the cases of the form "I marked my repository read-only and now some things are complaining that the repository is read-only". To do that more completely we'd have to insist on ILM and SLM being stopped while the repository is read-only, but this would have far too broad an impact to be acceptable.

@DaveCTurner
Copy link
Contributor Author

Your comment in this parallel thread prompted me to check the exact behaviour in this area today. I thought we already accepted a change to readonly: true on an in-use repository but did not properly coordinate that change to ensure that we never update the root blob on a readonly repository. In fact it's more complex than that, but basically we reject this change to the repo metadata if the repository is busy. This protection is not watertight: the repository can become busy after marking it readonly, breaking the obvious invariant ("readonly repository never has any pending activities"). There look to be other things that try to protect against operations started after the repository becomes readonly which prevent them from doing any meaningful work, but they do not look watertight (e.g. on the clone path).

TBC...

@DaveCTurner
Copy link
Contributor Author

... yeah ok this looks super-sketchy but not obviously broken (yet). The way that we don't look up the repository out of RepositoriesService too many times, plus the way that RepositoriesService updates the repositories on the applier thread (and closes the old instances), plus the check on the lifecycle state of those old instances in getRepositoryData, altogether seems to cut off all the avenues I can see that might bypass the readonly flag.

I think we should check the readonly flag in writeIndexGen (the one in the actual cluster state being updated, not the one held on the repo) and also tighten up the invariants to properly prevent creation of pending activities in a readonly repository, but today's behaviour is basically to reject updates that set readonly: true while the repository is active and therefore we should probably just stick with that.

@DaveCTurner
Copy link
Contributor Author

this looks super-sketchy but not obviously broken (yet)

To clarify, I mean that this is so implicit in the code today that it looks like it'd be easy to accidentally introduce a bug that allows writes into a readonly repository with some minor threading tweak or other simple change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

No branches or pull requests

3 participants