fix(emqx_cm): fix channel data registration race-condition

when clustered, there are chances the a mqtt client process
get killed (e.g. holding the channel registeration lock for too long),
if the channel data inserts happen before casting out the message
for channel process monitoring, there is a chance for the
stale message left in the ets tables indefinitely.

this commit changes the order of the non-atomic operations:
it casts out the monitor request message before inserting
channel data.
This commit is contained in:
Zaiming (Stone) Shi 2023-06-02 11:32:24 +02:00
parent 8d8efe449e
commit c75e9bbe0d
2 changed files with 7 additions and 1 deletions

View File

@ -176,11 +176,13 @@ insert_channel_info(ClientId, Info, Stats) ->
%% Note that: It should be called on a lock transaction
register_channel(ClientId, ChanPid, #{conn_mod := ConnMod}) when is_pid(ChanPid) ->
Chan = {ClientId, ChanPid},
%% cast (for process monitor) before inserting ets tables
cast({registered, Chan}),
true = ets:insert(?CHAN_TAB, Chan),
true = ets:insert(?CHAN_CONN_TAB, {Chan, ConnMod}),
ok = emqx_cm_registry:register_channel(Chan),
mark_channel_connected(ChanPid),
cast({registered, Chan}).
ok.
%% @doc Unregister a channel.
-spec unregister_channel(emqx_types:clientid()) -> ok.

View File

@ -0,0 +1,4 @@
Fix a race-condition in channel info registration.
Prior to this fix, when system is under heavy load, it might happen that a client is disconnected (or has its session expired) but still can be found in the clients page in dashboard.
One of the possible reasons is a race condition fixed in this PR: the connection is killed in the middle of channel data registration.