Fixes https://emqx.atlassian.net/browse/EMQX-9864
Setting a very large interval can cause `erlang:start_timer` to crash.
Also, setting auto_restart_interval or health_check_interval to "0s"
causes the state machine to be in loop as time 0 is handled separately:
| state_timeout() = timeout() | integer()
| (...)
| If Time is relative and 0 no timer is actually started, instead the the
| time-out event is enqueued to ensure that it gets processed before any
| not yet received external event.
from "https://www.erlang.org/doc/man/gen_statem.html#type-state_timeout"
Therefore, both fields are now validated against the range [1ms, 1h],
which doesn't cause above issues.
This is a performance improvement for webhook bridge.
Since this bridge is called using `async` callback mode, and `ehttpc`
frequently returns errors of the form `normal` and `{shutdown,
normal}` that are retried "for free" by `ehttpc`, we add this behavior
to async requests as well. Other errors are retried too, but they are
not "free": 3 attempts are made at a maximum.
This is important because, when using buffer workers, we should avoid
making them enter the `blocked` state, since that halts all progress
and makes throughput plummet.
* release-50:
fix(pulsar): use a binary duration as default `health_check_interval`
docs: add changelog entry
docs: clarify description of bridge username and password
chore: bump to v5.0.25
fix(limiter): adjust type for compatibility
fix(limiter): fix that update node-level limiter config will not working
chore: upgrade dashboard to v1.2.4-1 for ce
chore: upgarde rulesql to 0.1.6 to fix invaid utf8 input
chore: add changelog for 10659
fix: crash when sysmon.os.mem_check_interval = disabled
chore: bump influxdb version && update changes
refactor(influxdb): move influxdb bridge into its own app
chore: add listener default changelog
fix: ocsp cache SUITE failed
fix: ensure atom key for emqx_config:get
fix: only fill cerf_file default in server side
fix: authn init is empty
fix: bad listeners default ssl_options
Fixes https://emqx.atlassian.net/browse/EMQX-9902
When the buffer worker inflight window is full, we don’t need to set a
timer to flush the messages again because there’s no more room, and
one of the inflight windows will flush the buffer worker by calling
`flush_worker`.
Currently, we do set the timer on such situation, and this fact
combined with the default batch time of 0 yields a busy loop situation
where the CPU spins a lot while inflight messages do not return.
Some performance tests indicate that calling `telemetry` is costly in
hot paths. Since increasing a counter by 0 is a no-op, we should
avoid calling `telemetry` if the amount to increase is 0.
* release-50:
fix(limiter): fix an error when setting `max_conn_rate` in a listener
chore: bump erlcloud dependencies vsns
chore: rename dynamo template files
refactor(dynamo): move dynamo bridge into its own app
chore: update changes && bump app versions
fix: issues with the RabbitMQ config
refactor(pgsql): move pgsql && matrix && timescale bridges into their own app
fix: the iotdb password field so it has the password format
chore: update changes
refactor(tdengine): move tdengine bridge into its own app
feat: deprecate listeners's authn http api
(Almost?) fixes https://emqx.atlassian.net/browse/EMQX-9637
During the course of performance tests comparing the performance of
e5.0.3 and e4.4.16 regarding the webhook bridge in sync mode, we
observed that the throughput in e5.0.3 (sync) was much lower than in
e4.4.16: ~ 9 k msgs / s vs. ~ 50 k msgs / s, respectively.
Analyzing `observer_cli` output, we noticed that a lot of the time
both buffer workers and ehttpc processes was spent in `ets:info/2`.
That function was called to check the size of the inflight table when
updating metrics and checking if the inflight table was full. Other
uses of `ets:info/2` were contained inside the arguments to some
`?tp/2` macro usages (https://github.com/kafka4beam/snabbkaffe/pull/60).
By using a specific record to track the size of the table, we managed
to improve the bridge performance to ~ 45 k msgs / s in sync mode.
Fixes https://emqx.atlassian.net/browse/EMQX-9656
See also https://github.com/emqx/ehttpc/pull/45
This fixes a race condition where the remote server would close the
connection before or during requests, and, depending on timing, an
`{error, normal}` response would be returned. In those cases, we
should just retry the request without using up "retry credits".
Fixes https://emqx.atlassian.net/browse/EMQX-9635
During a sync call from process `A` to a buffer worker `B`, its call
to the underlying resource `C` can be very slow. In those cases, `A`
will receive a timeout response and expect no more messages from `B`
nor `C`. However, prior to this fix, if `B` is stuck in a long sync
call to `C` and then gets its response after `A` timed out, `B` would
still send the late response to `A`, polluting its mailbox.