-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error restoring backup from version 6.8 to 8.6 #93389
Comments
Pinging @elastic/es-search (Team:Search) |
I think this can legitimately happen if the snapshot was taken while indexing was ongoing. We don't restore regular snapshots into a |
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
Hey @lnowicki10 sorry for the lag. I wonder how you restored the backup taken from 6.8. Did you mean to restore it as archive index, in read-only mode? The error you got is confusing, but indices created before 7.0 are not supported in 8.x, see https://www.elastic.co/guide/en/elasticsearch/reference/7.17/setup-upgrade.html . They can though be imported as archive indices with limited functionality, see https://www.elastic.co/guide/en/elasticsearch/reference/current/archive-indices.html . Otherwise, they'll need to be reindexed. |
Nevermind, you can ignore my question above. I looked better at the stacktrace you provided and it does highlight that you mounted the index as archive, which should work. |
After further consideration, I am not quite sure how this error would be specific to archive indices. Archive indices rely on a read-only engine, which searchable snapshots also rely on, as mentioned above. It is peculiar that the error appears only when restoring against a read only engine in 8.x, as opposed to restoring to 7.x which succeeds, and read only engine is the only way to restore such index to 8.x so we can't test that it can be restored in 8.x against an ordinary engine. I am not sure how we can reproduce this issue, and I suspect this has to do with how the snapshot was created, and I am not familiar with the read only engine assertion that trips. Rerouting back to distrib for more info, this does not appear to be a search problem after all. |
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
That's expected to me - when restoring into 7.x it just goes into a regular index which doesn't mind that the GCP and MSN are not equal. It's only 8.x that uses the We would expect to see GCP != MSN if taking a snapshot while indexing is ongoing, so we need the I'm going to ask the engine/recovery folks to take a look. |
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
Ok thanks for clarifying @DaveCTurner . If the outcome is that we need to adjust some assertion in the archive indices codepath, we can take ownership back for it. I am just not sure currently how to proceed around fixing the issue as well as adding tests for it. |
I agree we can relax the GCP==MSN assertion/check and the easiest way is to not require the complete history for archives indices when opening the read-only engine (ie I think the check is really important when we re-initalize shards in place and we know we won't be able to write to them, for example when closing indices or for opening read-only compatible (N-2) indices. I suppose that creating a snapshot while a thread is continuously indexing should be enough to have GCP != MSN on CI. Otherwise blocking |
Elasticsearch Version
8.6.1
Installed Plugins
No response
Java Version
bundled
OS Version
CentOS 7
Problem Description
We are unable to restore some large backup from version 6.8.13 on 8.6.1
Restore fails on some shards with these messages :
Caused by: [users/owINpknoTCmTe2kMmzuRhg][[users][25]] org.elasticsearch.index.shard.IndexShardRecoveryException: failed recovery ... 20 more Caused by: [users/owINpknoTCmTe2kMmzuRhg][[users][25]] org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: restore failed ... 18 more Caused by: java.lang.IllegalStateException: Maximum sequence number [117122753] from last commit does not match global checkpoint [117122751]
The same snapshot restores without a problem on a 7.X cluster. The problem occurs on random shards on large indices with lots of data (100 shards, 2TB data)
Steps to Reproduce
Try to restore some large dataset from 6.X to 8.X.
Logs (if relevant)
[2023-01-30T15:17:10,547][WARN ][o.e.c.r.a.AllocationService] [es-master1-archive]failing shard [FailedShard[routingEntry=[users][13], node[do0HqvNyRluI84MCgDzBHA], [P], recovery_source[snapshot recovery [LVdaXt-nQ8i0D2lEy0QWAw] from backupS3_1674680402:snapshot_1674939601/6h2NrRU3RcmQgmnZizHIoA], s[INITIALIZING], a[id=2JHqNugaSfSRYkGsKHrYBA], unassigned_info[[reason=ALLOCATION_FAILED], at[2023-01-30T14:17:10.140Z], failed_attempts[4], failed_nodes[[do0HqvNyRluI84MCgDzBHA]], delayed=false, last_node[do0HqvNyRluI84MCgDzBHA], details[failed shard on node [do0HqvNyRluI84MCgDzBHA]: failed recovery, failure org.elasticsearch.indices.recovery.RecoveryFailedException: [users][13]: Recovery failed on {es16-archive-2}{do0HqvNyRluI84MCgDzBHA}{6RlBvSk_QjOUpaDcpJEkhg}{es16-archive-2}{10.94.121.5}{10.94.121.5:9301}{d}{xpack.installed=true} at org.elasticsearch.index.shard.IndexShard.lambda$executeRecovery$24(IndexShard.java:3123) at org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:170) at org.elasticsearch.index.shard.StoreRecovery.lambda$recoveryListener$6(StoreRecovery.java:385) at org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:170) at org.elasticsearch.index.shard.StoreRecovery.lambda$restore$8(StoreRecovery.java:518) at org.elasticsearch.action.ActionListener$2.onFailure(ActionListener.java:170) at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:164) at org.elasticsearch.action.ActionListener$DelegatingActionListener.onResponse(ActionListener.java:212) at org.elasticsearch.action.ActionListener$RunBeforeActionListener.onResponse(ActionListener.java:397) at org.elasticsearch.repositories.blobstore.FileRestoreContext.lambda$restore$1(FileRestoreContext.java:166) at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:162) at org.elasticsearch.action.ActionListener$MappedActionListener.onResponse(ActionListener.java:127) at org.elasticsearch.action.support.GroupedActionListener.onResponse(GroupedActionListener.java:55) at org.elasticsearch.action.ActionListener$DelegatingActionListener.onResponse(ActionListener.java:212) at org.elasticsearch.repositories.blobstore.BlobStoreRepository$11.executeOneFileRestore(BlobStoreRepository.java:3066) at org.elasticsearch.repositories.blobstore.BlobStoreRepository$11.lambda$executeOneFileRestore$1(BlobStoreRepository.java:3075) at org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:72) at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:917) at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) at java.lang.Thread.run(Thread.java:1589) Caused by: [users/owINpknoTCmTe2kMmzuRhg][[users][13]] org.elasticsearch.index.shard.IndexShardRecoveryException: failed recovery ... 20 more Caused by: [users/owINpknoTCmTe2kMmzuRhg][[users][13]] org.elasticsearch.index.snapshots.IndexShardRestoreFailedException: restore failed ... 18 more Caused by: java.lang.IllegalStateException: Maximum sequence number [105904628] from last commit does not match global checkpoint [105904627] at org.elasticsearch.index.engine.ReadOnlyEngine.ensureMaxSeqNoEqualsToGlobalCheckpoint(ReadOnlyEngine.java:184) at org.elasticsearch.index.engine.ReadOnlyEngine.<init>(ReadOnlyEngine.java:121) at org.elasticsearch.xpack.lucene.bwc.OldLuceneVersions.lambda$getEngineFactory$4(OldLuceneVersions.java:248) at org.elasticsearch.index.shard.IndexShard.innerOpenEngineAndTranslog(IndexShard.java:1949) at org.elasticsearch.index.shard.IndexShard.openEngineAndRecoverFromTranslog(IndexShard.java:1913) at org.elasticsearch.index.shard.StoreRecovery.lambda$restore$7(StoreRecovery.java:513) at org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:162) ... 15 more
The text was updated successfully, but these errors were encountered: