You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* add Checkpoint table and read/write funcs
* handle no checkpoints returned
* store min and max range values in checkpoint
* resume from checkpoint
* add checkpoint file
* fix unique key args
* update applier coordinates from _ghc heartbeat
* fix test
* fix linter
* make checkpoint interval configurable
* write checkpoint iteration number
* store rows copied & dml applied
* truncate column name if necessary
* drop checkpoint table for final cleanup
* add docs
* add resume doc
Copy file name to clipboardExpand all lines: doc/command-line-flags.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,6 +64,15 @@ It is not reliable to parse the `ALTER` statement to determine if it is instant
64
64
### binlogsyncer-max-reconnect-attempts
65
65
`--binlogsyncer-max-reconnect-attempts=0`, the maximum number of attempts to re-establish a broken inspector connection for sync binlog. `0` or `negative number` means infinite retry, default `0`
66
66
67
+
### checkpoint
68
+
69
+
`--checkpoint` enables periodic checkpoints of the gh-ost's state so that gh-ost can resume a migration from the checkpoint with `--resume`. Checkpoints are written to a separate table named `_${original_table_name}_ghk`. It is recommended to use with `--gtid` for checkpoints.
70
+
See also: [`resuming-migrations`](resume.md)
71
+
72
+
### checkpoint-seconds
73
+
74
+
`--checkpoint-seconds` specifies the seconds between checkpoints. Default is 300.
75
+
67
76
### conf
68
77
69
78
`--conf=/path/to/my.cnf`: file where credentials are specified. Should be in (or contain) the following format:
@@ -226,6 +235,11 @@ Optionally involve the process ID, for example: `--replica-server-id=$((10000000
226
235
It's on you to choose a number that does not collide with another `gh-ost` or another running replica.
227
236
See also: [`concurrent-migrations`](cheatsheet.md#concurrent-migrations) on the cheatsheet.
228
237
238
+
### resume
239
+
240
+
`--resume` attempts to resume a migration that was previously interrupted from the last checkpoint. The first `gh-ost` invocation must run with `--checkpoint` and have successfully written a checkpoint in order for `--resume` to work.
241
+
See also: [`resuming-migrations`](resume.md)
242
+
229
243
### serve-socket-file
230
244
231
245
Defaults to an auto-determined and advertised upon startup file. Defines Unix socket file to serve on.
`gh-ost` can attempt to resume an interrupted migration from a checkpoint if the following conditions are met:
4
+
- The first `gh-ost` process was invoked with `--checkpoint`
5
+
- The first `gh-ost` process had at least one successful checkpoint
6
+
- The binlogs from the last checkpoint's binlog coordinates still exist on the replica gh-ost is inspecting (specified by `--host`)
7
+
8
+
To resume, invoke `gh-ost` again with the same arguments with the `--resume` flag.
9
+
10
+
> [!WARNING]
11
+
> It is recommended use `--checkpoint` with `--gtid` enabled so that checkpoint binlog coordinates store GTID sets rather than file positions. In that case, `gh-ost` can resume using a different replica than it originally attached to.
12
+
13
+
## Example
14
+
The migration starts with a `gh-ost` invocation such as:
15
+
```shell
16
+
gh-ost \
17
+
--chunk-size=100 \
18
+
--host=replica1.company.com \
19
+
--database="mydb" \
20
+
--table="mytable" \
21
+
--alter="add column mycol varchar(20)"
22
+
--gtid \
23
+
--checkpoint \
24
+
--checkpoint-seconds=60 \
25
+
--execute
26
+
```
27
+
28
+
In this example `gh-ost` writes a checkpoint to a table `_mytable_ghk` every 60 seconds. After `gh-ost` is interrupted/killed, the migration can be resumed with:
29
+
```shell
30
+
# resume migration
31
+
gh-ost \
32
+
--chunk-size=100
33
+
--host=replica1.company.com \
34
+
--database="mydb" \
35
+
--table="mytable" \
36
+
--alter="add column mycol varchar(20)"
37
+
--gtid \
38
+
--resume \
39
+
--execute
40
+
```
41
+
42
+
`gh-ost` then reconnects at the binlog coordinates of the last checkpoint and resumes copying rows at the chunk specified by the checkpoint. The data integrity of the ghost table is preserved because `gh-ost` applies row DMLs and copies row in an idempotent way.
Copy file name to clipboardExpand all lines: go/cmd/gh-ost/main.go
+7Lines changed: 7 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -145,6 +145,9 @@ func main() {
145
145
flag.StringVar(&migrationContext.TriggerSuffix, "trigger-suffix", "", "Add a suffix to the trigger name (i.e '_v2'). Requires '--include-triggers'")
146
146
flag.BoolVar(&migrationContext.RemoveTriggerSuffix, "remove-trigger-suffix-if-exists", false, "Remove given suffix from name of trigger. Requires '--include-triggers' and '--trigger-suffix'")
147
147
flag.BoolVar(&migrationContext.SkipPortValidation, "skip-port-validation", false, "Skip port validation for MySQL connections")
flag.Int64Var(&migrationContext.CheckpointIntervalSeconds, "checkpoint-seconds", 300, "The number of seconds between checkpoints")
150
+
flag.BoolVar(&migrationContext.Resume, "resume", false, "Attempt to resume migration from checkpoint")
148
151
149
152
maxLoad:=flag.String("max-load", "", "Comma delimited status-name=threshold. e.g: 'Threads_running=100,Threads_connected=500'. When status exceeds threshold, app throttles writes")
150
153
criticalLoad:=flag.String("critical-load", "", "Comma delimited status-name=threshold, same format as --max-load. When status exceeds threshold, app panics and quits")
@@ -284,6 +287,9 @@ func main() {
284
287
if*storageEngine=="rocksdb" {
285
288
migrationContext.Log.Warning("RocksDB storage engine support is experimental")
286
289
}
290
+
ifmigrationContext.CheckpointIntervalSeconds<10 {
291
+
migrationContext.Log.Fatalf("--checkpoint-seconds should be >=10")
row:=this.db.QueryRow(fmt.Sprintf(`select /* gh-ost */ * from %s.%s order by gh_ost_chk_id desc limit 1`, this.migrationContext.DatabaseName, this.migrationContext.GetCheckpointTableName()))
0 commit comments