You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug #33482064 rpl.rpl_semi_sync_turn_on_off_optimize_for_static_plugin_config fails on asan
Problem
-------
Ack Receiver thread is reading empty pointer which is pointed out by ASAN
(heap-use-after-free).
Analysis / Root-cause analysis
------------------------------
Problem happens when semi-synchronous replication is turned on and the replica
is disconnecting. These conditions may cause Ack Receiver to encounter error
when reading communication packets (ER_NET_READ_ERROR).
It may happen that Binary log dump thread resources are released before Ack
Receiver will remove connection information while AckReceiver wrongfully
iterates over it. Let's consider the following flow:
1. Removing of thread X is requested from thread S. S changes `is_leaving` to
true and waits for the confirmation.
2. Function `init_replica_sockets` is called from the Ack Receiver thread A,
therefore X is allowed to leave. A changes `is_leaving` to false.
3. A is listening on sockets, possibly `listen_on_sockets` returns some error,
but not critical, or just A processes the data from sockets quickly.
4. Other threads are requested to leave - `m_slaves_changed` is set to true
5. A is woken up again (S is still waiting for its turn), and calls
`init_replica_sockets again`. Listener checks the status of X (still not
removed from the Ack Receiver container), it is false, therefore X is
copied into the Listener internal container.
6. S is woken up. It sees that status of the X thread is false, therefore it
was allowed to leave. S thinks that X was removed from the listener container,
therefore it is removing X from the Ack Receiver container.
7. A is woken up to process data from sockets. In the meantime, replica shuts
down and S resources are released. A gots ER_NET_READ_ERROR and checks the
while condition (net.vio->has_data(net.vio)).
Vio pointer is NULL at this point. A SIGSEGV signal is issued.
Implemented solution
--------------------
'Slave' 'is_leaving' internal variable type is changed to newly introduced
enumeration class 'enum_status'. 'enum_status' class contains three
states: up, leaving and down. Leaving thread is changing status to
'leaving'. Ack Receiver notices that 'status' value of the leaving
thread changed to 'leaving' and changes the status to 'down'. If
'Slave' object state is 'down' or 'leaving', Ack Receiver thread is
ommiting the 'Slave' object while creating the 'Listener' vector.
Signed-off-by: Karolina Szczepankiewicz <karolina.szczepankiewicz@oracle.com>
Change-Id: Ic59191479473c86d4b97da060457930a52e79f0a
0 commit comments