Our MongoDB driver creates a new temporary connection, for every active
connection, to just do a single heartbeat test. There is configurable
delay between every heartbeat test. When the user has an EMQX cluster
with a MongoDB bridge (to a MongoDB replica set), there will be a lot of
connections. Furthermore, as MongoDB creates a log entry every time a
new connection is created, the log will be flooded with info about new
connection. One user have reported more than 1MB of log data in a 10
minute period.
This commit tries to fix this by increasing the default delay between
heartbeats. A better fix would be to change the MongoDB driver so that
it does not create a new connection just to do a heartbeat check, but
this is more complicated so we leave this to the future. We might also
swap out the current MongoDB driver to something better.
Fixes:
https://github.com/emqx/emqx/issues/9851
The resource manager may be busy at times, so this change ensures that
getting resource instance state will not block. Currently, no users of
`emqx_resource:get_instance/1` do seem to be relying on state being
"as-actual-as-possible" guarantee it was providing.
Also hexencode non-utf8 binaries. This is essentially an heuristic.
We don't know column types in runtime, and there's no simple way
to find them out. Since we're already doing full binary scan during
escaping it should be cheap to bail out on non-utf8 strings and
hexencode them instead.
Also introduce separate function to highlight that this escaping
is MySQL-specific.
So that mysql client won't attempt to prepare them automatically, thus
trashing the server's prepared statements table and making interaction
overall heavier.
This fixes a crash with an error in the log file (see below) that
happened when the MongoDB authorization module queried the database. The
reason is that the collection name that was sent to the mongodb
connection was an atom. This is fixed by making sure it is not an atom.
2023-03-08T17:16:34.215523+01:00 [error] msg: query_mongo_error, mfa:
emqx_authz_mongodb:authorize/4, line: 95, peername: 127.0.0.1:53212,
clientid: client123, collection: mqtt_acl, filter: #{username =>
<<"emqx_u">>}, reason: {resource_error,#{msg => #{error =>
{error,{error_cannot_parse_response,{op_msg_response,#{<<"code">> =>
73,<<"codeName">> => <<"InvalidNamespace">>,<<"errmsg">> => <<"Failed to
parse namespace element">>,<<"ok">> => 0.0}}}},id =>
<<"emqx_authz_mongodb:3">>,name => call_query,request =>
{find,mqtt_acl,#{username => <<"emqx_u">>},#{}},stacktrace =>
[{mc_connection_man,reply,1,[{file,"mc_connection_man.erl"},{line,123}],
...]}, reason => exception}}, resource_id: <<"emqx_authz_mongodb:3">>
Fixes: https://github.com/emqx/emqx/issues/9783
When using async mode with the webhook bridge, queued messages that are
not fully processed when the connection times out could be lost. This
commit fixes this by letting the bridge return a recoverable_error when
this happen. The message send will then be retried in sync mode by the
emqx_resource_buffer_worker.
Fixes: https://emqx.atlassian.net/browse/EMQX-8974
The current schema allows `infinity` for `request_timeout`, so we have
to take that into account. It's not currently possible to set
`batch_time = infinity`, so there's no need to treat that case.
To avoid message loss due to misconfigurations, we adjust `batch_time`
based on `request_timeout`. If `batch_time` > `request_timeout`, all
requests will timeout before being sent if the message rate is low.
Even worse if `pool_size` is high. We cap `batch_time` at
`request_timeout div 2` as a rule of thumb.
flush_time_interval is used to calculate statsd sampling rate:
rate = sample_time_interval / flush_time_interval
This means that flush_time_interval must always be greater than (or equal to)
sample_time_interval, otherwise, the sampling rate will be invalid (> 1).
Relates to EMQX-9055
emqx_config_handler:post_config_update/5 cb is called before an updated config is saved.
Thus, a process being restarted in that callback cannot get the latest config by calling
emqx_conf:get/2, because that update is not saved yet.
Relates to EMQX-9055
Prior to this change, when EMQX daemon mode failed to start
it's not quite easy for users to understand what went wrong.
All the know is the node did not start in time
and then instructed to boot the node in 'console' mode wishing
for some logs.
However, the node might actuay be running, causing 'console' mode
to fail with a different reason.
With this change, after a filure of daemon mode boot,
we issue a diagnosis.
1. if node can not be found from ps -ef, instruct the user
to find information in erlang.log.N
2. if the node is found running, but not responding to pings
instruct the user to check if the node name is
resolvable and reachable
3. if the node is responding to pings but emqx app is not
running, then it's likely a bug. so the user is advised
to report a github issue.
This commit adds a Clickhouse bridge to EMQX 5. The bridge is similar to
the Clickhouse bridge in the 4.4, but adds the possibility to use
different formats (such as JSON) for values to be inserted.