Skip to content

Commit 71e013c

Browse files
author
Joao Gramacho
committed
BUG#22245619 SERVER ABORT AFTER SYNC STAGE OF THE COMMIT FAILS
Problem: The binary log group commit sync is failing when committing a group of transactions into a non-transactional storage engine while other thread is rotating the binary log. Analysis: The binary log group commit procedure (ordered_commit) acquires LOCK_log during the #1 stage (flush). As it holds the LOCK_log, a binary log rotation will have to wait until this flush stage to finish before actually rotating the binary log. For the #2 stage (sync), the binary log group commit only holds the LOCK_log if sync_binlog=1. In this case, the rotation has to wait also for the sync stage to finish. When sync_binlog>1, the sync stage releases the LOCK_log (to let other groups to enter the flush stage), holding only the LOCK_sync. In this case, the rotation can acquire the LOCK_log in parallel with the sync stage. For commits into transactional storage engine, the binary log rotation checks a counter of "flushed but not yet committed" transactions, waiting until this counter to be zeroed before closing the current binary log file. As the commit of the transactions happen in the #3 stage of the binary log group commit, the sync of the binary log in stage #2 always succeed. For commits into non-transactional storage engine, the binary log rotation is checking the "flushed but not yet committed" transactions counter, but it is zero because it only counts transactions that contains XIDs. So, the rotation is allowed to take place in parallel with the #2 stage of the binary log group commit. When the sync is called at the same time that the rotation has closed the old binary log file but didn't open the new file yet, the sync is failing with the following error: 'Can't sync file 'UNOPENED' to disk (Errcode: 9 - Bad file descriptor)'. Fix: For non-transactional only workload, binary log group commit will keep the LOCK_log when entering #2 stage (sync) if the current group is supposed to be synced to the binary log file.
1 parent 3b633ba commit 71e013c

File tree

3 files changed

+165
-1
lines changed

3 files changed

+165
-1
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
SET @saved_binlog_error_action= @@GLOBAL.binlog_error_action;
2+
SET GLOBAL binlog_error_action= ABORT_SERVER;
3+
SET @saved_sync_binlog= @@GLOBAL.sync_binlog;
4+
SET GLOBAL sync_binlog= 2;
5+
[connection conn1]
6+
CREATE TABLE t1 (c1 INT) ENGINE=MyISAM;
7+
SET DEBUG_SYNC= 'before_sync_binlog_file SIGNAL holding_before_bgc_sync_binlog_file WAIT_FOR continue_bgc_sync_binlog_file';
8+
INSERT INTO t1 VALUES (1);
9+
[connection conn2]
10+
SET DEBUG_SYNC= 'NOW WAIT_FOR holding_before_bgc_sync_binlog_file';
11+
SET DEBUG_SYNC= 'before_rotate_binlog_file SIGNAL holding_before_rotate_binlog_file WAIT_FOR continue_rotate_binlog_file';
12+
FLUSH LOGS;
13+
[connection default]
14+
SET DEBUG_SYNC= 'now WAIT_FOR holding_before_rotate_binlog_file TIMEOUT 3';
15+
SET DEBUG_SYNC= 'now SIGNAL continue_bgc_sync_binlog_file';
16+
SET DEBUG_SYNC= 'before_rotate_binlog_file CLEAR';
17+
SET DEBUG_SYNC = 'now SIGNAL continue_rotate_binlog_file';
18+
[connection conn1]
19+
[connection conn2]
20+
[connection default]
21+
DROP TABLE t1;
22+
SET DEBUG_SYNC= 'RESET';
23+
[connection conn1]
24+
CREATE TABLE t1 (c1 INT) ENGINE=InnoDB;
25+
SET DEBUG_SYNC= 'before_sync_binlog_file SIGNAL holding_before_bgc_sync_binlog_file WAIT_FOR continue_bgc_sync_binlog_file';
26+
INSERT INTO t1 VALUES (1);
27+
[connection conn2]
28+
SET DEBUG_SYNC= 'NOW WAIT_FOR holding_before_bgc_sync_binlog_file';
29+
SET DEBUG_SYNC= 'before_rotate_binlog_file SIGNAL holding_before_rotate_binlog_file WAIT_FOR continue_rotate_binlog_file';
30+
FLUSH LOGS;
31+
[connection default]
32+
SET DEBUG_SYNC= 'now WAIT_FOR holding_before_rotate_binlog_file TIMEOUT 3';
33+
SET DEBUG_SYNC= 'now SIGNAL continue_bgc_sync_binlog_file';
34+
SET DEBUG_SYNC= 'before_rotate_binlog_file CLEAR';
35+
SET DEBUG_SYNC = 'now SIGNAL continue_rotate_binlog_file';
36+
[connection conn1]
37+
[connection conn2]
38+
[connection default]
39+
DROP TABLE t1;
40+
SET DEBUG_SYNC= 'RESET';
41+
SET GLOBAL binlog_error_action= @saved_binlog_error_action;
42+
SET GLOBAL sync_binlog= @saved_sync_binlog;
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# ==== Purpose ====
2+
#
3+
# This test will try to rotate the binary log of the server while binary log
4+
# group commit sync stage tries to sync the recently flushed binary log group.
5+
#
6+
# As binary log group commit releases the binary log's LOCK_log right after
7+
# finishing the flush stage and entering the sync stage, the rotate procedure
8+
# (executed by MYSQL_BIN_LOG::new_file_impl) will be able take LOCK_log, but
9+
# it will delay the binary log rotation until the amount of prepared
10+
# transactions not yet committed be zero or the binary log group commit sync
11+
# stage has finished.
12+
#
13+
# ==== Related Bugs and Worklogs ====
14+
#
15+
# BUG#22245619 SERVER ABORT AFTER SYNC STAGE OF THE COMMIT FAILS
16+
#
17+
18+
# This test case is binary log format agnostic
19+
--source include/have_binlog_format_row.inc
20+
--source include/have_debug_sync.inc
21+
22+
# Set the options to hit the issue
23+
SET @saved_binlog_error_action= @@GLOBAL.binlog_error_action;
24+
SET GLOBAL binlog_error_action= ABORT_SERVER;
25+
SET @saved_sync_binlog= @@GLOBAL.sync_binlog;
26+
SET GLOBAL sync_binlog= 2;
27+
28+
# Create two additional connections
29+
# conn1 will do the binary log group commit
30+
# conn2 will rotate the binary log
31+
# the default connection will coordinate the test case activity
32+
--connect(conn1,localhost,root,,test)
33+
--connect(conn2,localhost,root,,test)
34+
35+
--let $engine= MyISAM
36+
37+
while ($engine)
38+
{
39+
--let $rpl_connection_name= conn1
40+
--source include/rpl_connection.inc
41+
# Create a new table
42+
--eval CREATE TABLE t1 (c1 INT) ENGINE=$engine
43+
44+
# Make the server to hold before syncing the binary log group
45+
SET DEBUG_SYNC= 'before_sync_binlog_file SIGNAL holding_before_bgc_sync_binlog_file WAIT_FOR continue_bgc_sync_binlog_file';
46+
--send INSERT INTO t1 VALUES (1)
47+
48+
--let $rpl_connection_name= conn2
49+
--source include/rpl_connection.inc
50+
# Wait until it reached the sync binary log group
51+
SET DEBUG_SYNC= 'NOW WAIT_FOR holding_before_bgc_sync_binlog_file';
52+
53+
# Make the server to hold while rotating the binary log
54+
# It can hold in two places:
55+
# a) waiting before all flushed transactions with Xid to be committed;
56+
# b) after closing the old and before opening the new binary log file;
57+
#
58+
# The debug sync will happen at (a) if there are transactions for a
59+
# transactional storage engine or at (b) if there are no transactions
60+
# for a transactional storage engine in the group to be committed.
61+
SET DEBUG_SYNC= 'before_rotate_binlog_file SIGNAL holding_before_rotate_binlog_file WAIT_FOR continue_rotate_binlog_file';
62+
# Rotate the binary log
63+
--send FLUSH LOGS
64+
65+
# Wait until the server reaches the debug sync point while rotating the
66+
# binary log
67+
--let $rpl_connection_name= default
68+
--source include/rpl_connection.inc
69+
--disable_warnings
70+
SET DEBUG_SYNC= 'now WAIT_FOR holding_before_rotate_binlog_file TIMEOUT 3';
71+
--enable_warnings
72+
73+
# Let the binary log group commit to sync and continue
74+
SET DEBUG_SYNC= 'now SIGNAL continue_bgc_sync_binlog_file';
75+
# Clear the binary log rotate debug sync point to avoid it to stop twice
76+
SET DEBUG_SYNC= 'before_rotate_binlog_file CLEAR';
77+
# Let the binary log rotate to continue
78+
SET DEBUG_SYNC = 'now SIGNAL continue_rotate_binlog_file';
79+
80+
--let $rpl_connection_name= conn1
81+
--source include/rpl_connection.inc
82+
--reap
83+
84+
--let $rpl_connection_name= conn2
85+
--source include/rpl_connection.inc
86+
--reap
87+
88+
--let $rpl_connection_name= default
89+
--source include/rpl_connection.inc
90+
91+
# Cleanup
92+
DROP TABLE t1;
93+
SET DEBUG_SYNC= 'RESET';
94+
95+
if ($engine == InnoDB)
96+
{
97+
--let $engine=
98+
}
99+
if ($engine == MyISAM)
100+
{
101+
--let $engine= InnoDB
102+
}
103+
}
104+
105+
# Disconnect the additional connections
106+
--disconnect conn1
107+
--disconnect conn2
108+
109+
SET GLOBAL binlog_error_action= @saved_binlog_error_action;
110+
SET GLOBAL sync_binlog= @saved_sync_binlog;

sql/binlog.cc

+13-1
Original file line numberDiff line numberDiff line change
@@ -4904,7 +4904,10 @@ int MYSQL_BIN_LOG::new_file_impl(bool need_lock_log, Format_description_log_even
49044904
written to the binary log.
49054905
*/
49064906
while (get_prep_xids() > 0)
4907+
{
4908+
DEBUG_SYNC(current_thd, "before_rotate_binlog_file");
49074909
mysql_cond_wait(&m_prep_xids_cond, &LOCK_xids);
4910+
}
49084911
mysql_mutex_unlock(&LOCK_xids);
49094912

49104913
mysql_mutex_lock(&LOCK_index);
@@ -4979,6 +4982,7 @@ int MYSQL_BIN_LOG::new_file_impl(bool need_lock_log, Format_description_log_even
49794982
Note that at this point, log_state != LOG_CLOSED (important for is_open()).
49804983
*/
49814984

4985+
DEBUG_SYNC(current_thd, "before_rotate_binlog_file");
49824986
/*
49834987
new_file() is only used for rotation (in FLUSH LOGS or because size >
49844988
max_binlog_size or max_relay_log_size).
@@ -7093,6 +7097,7 @@ int MYSQL_BIN_LOG::ordered_commit(THD *thd, bool all, bool skip_commit)
70937097
mysql_mutex_t *leave_mutex_before_commit_stage= NULL;
70947098
my_off_t flush_end_pos= 0;
70957099
bool need_LOCK_log;
7100+
unsigned int sync_period;
70967101
if (unlikely(!is_open()))
70977102
{
70987103
final_queue= stage_manager.fetch_queue_for(Stage_manager::FLUSH_STAGE);
@@ -7144,11 +7149,18 @@ int MYSQL_BIN_LOG::ordered_commit(THD *thd, bool all, bool skip_commit)
71447149
/*
71457150
Stage #2: Syncing binary log file to disk
71467151
*/
7147-
need_LOCK_log= (get_sync_period() == 1);
7152+
sync_period= get_sync_period();
7153+
need_LOCK_log= (sync_period == 1) ||
7154+
(sync_counter + 1 >= sync_period && get_prep_xids() == 0);
71487155

71497156
/*
71507157
LOCK_log is not released when sync_binlog is 1. It guarantees that the
71517158
events are not be replicated by dump threads before they are synced to disk.
7159+
7160+
LOCK_log is not released also when we are about to sync the binary log and
7161+
there is no transactional storage engine prepared transactions. This will
7162+
guarantee that the binary log rotation will not take place before syncing
7163+
the binary log file.
71527164
*/
71537165
if (change_stage(thd, Stage_manager::SYNC_STAGE, wait_queue,
71547166
need_LOCK_log ? NULL : &LOCK_log, &LOCK_sync))

0 commit comments

Comments
 (0)