Port of https://github.com/emqx/emqx/pull/10154 for `release-50`
Fixes https://emqx.atlassian.net/browse/EMQX-9099
Originally, the `resume_interval`, which is what defines how often a
buffer worker will attempt to retry its inflight window, was set to
the same as the `health_check_interval`. This had the problem that,
with default values, `health_check_interval = request_timeout`. This
meant that, if a buffer worker with those configs were ever blocked,
all requests would have timed out by the time it retried them.
Here we change the default `resume_interval` to a reasonable value
dependent on `health_check_interval` and `request_timeout`, and also
expose that as a hidden parameter for fine tuning if necessary.
Fixes https://emqx.atlassian.net/browse/EMQX-9392
The returned information does not allow to diagnose the issue (i.e.: a
connection issue due to the wrong host and port, the wrong password
failing authn). However, such information is printed to the logs.
This changes the returned error to the API so that the user is hinted
at looking at the logs for further investigation of the error.
Fixes https://emqx.atlassian.net/browse/EMQX-9325
Currently, ingress bridges referenced in the `FROM` clause of rules
are not being accounted as dependencies.
When we try to delete an ingress bridge that's referenced in a rule
like `select * from "$bridges/mqtt:ingress"`, that bridge does not
trigger an UI warning about dependent actions.
Metrics should only be exposed via the /bridges/:id/metrics endpoint,
and not in other operations such as getting the list of all bridges, or
in the response when a bridge has been created. This commit removes all
traces of metrics for the non-dedicated API endpoints.
This was not caught by our tests because we always test bridge types
in isolation. So, if the config only contains ingress-only bridges,
the `on_message_publish` hook is never installed.
In a real system, if there are bridges of mixed types in the config,
the hook might be installed, and `emqx_bridge:get_matched_bridge_id`
would crash when iterating over the ingress bridges.
* Add option to start autocluster. This is useful for scenarios where
a cluster is already running and has some configurations set (via
config handler/cluster rpc) and later another node joins.
* Improve mnesia data directory isolation for nodes. A new dir is set
so that we may avoid intra and inter-suite flakiness due to dirty
tables and schema.
* The janitor is now called synchronously. This ensures the cleanup
is done before the test pid dies.