fix(live_conn): fix live connection count on race condition (5.0)
Port from #6406 to 5.0.
When multiple clients try to connect concurrently using the same
client ID, they all call `emqx_channel:ensure_connected`, increasing
the live connection count, but only one will successfully acquire the
lock for that client ID. This means that all other clients that
increased the live connection count will not get to call neither
`emqx_channel:ensure_disconnected` nor be monitored for `DOWN`
messages, effectively causing a count leak.
By moving the increment to `emqx_cm:register_channel`, which is only
called inside the lock, we can remove this leakage.
Also, during the handling of `DOWN` messages, we now iterate over all
channel PIDs returned by `eqmx_misc:drain_down`, since it could be
that one or more PIDs are not contained in the `pmon` state.
Port from #6406 to 5.0.
When multiple clients try to connect concurrently using the same
client ID, they all call `emqx_channel:ensure_connected`, increasing
the live connection count, but only one will successfully acquire the
lock for that client ID. This means that all other clients that
increased the live connection count will not get to call neither
`emqx_channel:ensure_disconnected` nor be monitored for `DOWN`
messages, effectively causing a count leak.
By moving the increment to `emqx_cm:register_channel`, which is only
called inside the lock, we can remove this leakage.
Also, during the handling of `DOWN` messages, we now iterate over all
channel PIDs returned by `eqmx_misc:drain_down`, since it could be
that one or more PIDs are not contained in the `pmon` state.
Sessions must not enqueue messages when another process is taking over
the client id, since it already passed on the message queue in the
session state.
Without this fix, messages arriving after `{takeover, 'begin'} to a
channel with no connection (i.e., a persistent session) would be lost.
fix(listeners): fix typo in listener type
`emqx_listeners:{current,max}_conns` were matching on type `tcl`.
However, this type doesn't exist (it's not defined in
`?TYPES_STRING`). Therefore, this clause would never match. It seems
that the intention was that it shouldbe `tcp`.
In particular, this should remove the flaky snabbkaffe failures in
persistent session SUITE where the snabbkaffe_nemesis is trying to
make an ets:lookup in a table that no longer exists.