Commit Graph

422 Commits

Author SHA1 Message Date
zhongwencool a953b951fe
Merge branch 'master' into sync-release-50-to-master 2023-05-12 18:01:58 +08:00
Thales Macedo Garitezi 64dc9ed46a perf(metrics): avoid increasing counters by 0
Some performance tests indicate that calling `telemetry` is costly in
hot paths.  Since increasing a counter by 0 is a no-op, we should
avoid calling `telemetry` if the amount to increase is 0.
2023-05-11 15:13:37 -03:00
Kjell Winblad 70cf1533db feat: add RabbitMQ bridge 2023-05-09 14:32:26 +02:00
Zaiming (Stone) Shi 13dcb5732f Merge remote-tracking branch 'origin/release-50' into 0508-prepare-for-e5.0.4 2023-05-08 21:29:35 +02:00
Thales Macedo Garitezi eba627b365 fix(buffer_worker): fix inflight count when updating inflight item 2023-05-08 09:27:51 -03:00
Zhongwen Deng 4f396a36a9 Merge remote-tracking branch 'upstream/master' into release-50 2023-05-08 14:58:03 +08:00
Thales Macedo Garitezi 8aa7c014e7 perf(buffer_worker): avoid calling `ets:info/2`
(Almost?) fixes https://emqx.atlassian.net/browse/EMQX-9637

During the course of performance tests comparing the performance of
e5.0.3 and e4.4.16 regarding the webhook bridge in sync mode, we
observed that the throughput in e5.0.3 (sync) was much lower than in
e4.4.16: ~ 9 k msgs / s vs. ~ 50 k msgs / s, respectively.

Analyzing `observer_cli` output, we noticed that a lot of the time
both buffer workers and ehttpc processes was spent in `ets:info/2`.
That function was called to check the size of the inflight table when
updating metrics and checking if the inflight table was full.  Other
uses of `ets:info/2` were contained inside the arguments to some
`?tp/2` macro usages (https://github.com/kafka4beam/snabbkaffe/pull/60).

By using a specific record to track the size of the table, we managed
to improve the bridge performance to ~ 45 k msgs / s in sync mode.
2023-05-02 17:05:32 -03:00
Andrew Mayorov 670709f746
feat(resource): ensure uniqueness through `gproc`
Also use it instead of a custom ETS table for simplicity and
better consistency. This has drawbacks though: expect slightly
increased load on gproc gen_server due to how `gproc:set_value/2`
works.
2023-05-02 17:29:22 +03:00
Andrew Mayorov 4575167607
feat(resource): drop `manager_id()` type 2023-05-02 17:29:20 +03:00
Andrew Mayorov aaef95b1da
feat(resman): stop adding uniqueness to manager ids
Before this change, a separate `manager_id` / `instance_id` was used
as resource manager id, which made connector interface somewhat
inconsistent: part of function calls to connector implementation
used instance id as first argument while the rest used resource id
itself.
2023-05-02 17:28:26 +03:00
Thales Macedo Garitezi 7853a4c36e chore: bump app vsns 2023-04-27 11:58:28 -03:00
Thales Macedo Garitezi 567413389c
Merge pull request #10519 from thalesmg/fix-flaky-res-test-v50
test(resource): fix flaky test
2023-04-27 09:33:40 -03:00
Thales Macedo Garitezi c53741a08c fix(buffer_worker): avoid sending late reply messages to callers
Fixes https://emqx.atlassian.net/browse/EMQX-9635

During a sync call from process `A` to a buffer worker `B`, its call
to the underlying resource `C` can be very slow.  In those cases, `A`
will receive a timeout response and expect no more messages from `B`
nor `C`.  However, prior to this fix, if `B` is stuck in a long sync
call to `C` and then gets its response after `A` timed out, `B` would
still send the late response to `A`, polluting its mailbox.
2023-04-26 13:18:28 -03:00
Thales Macedo Garitezi d78312e10e test(resource): fix flaky test 2023-04-26 09:25:33 -03:00
zhongwencool 9d893b49eb
Merge branch 'master' into sync-release-50-to-master 2023-04-26 10:54:46 +08:00
Thales Macedo Garitezi ad4be08bb2 feat: implement Pulsar Producer bridge (e5.0)
Fixes https://emqx.atlassian.net/browse/EMQX-8398
2023-04-24 10:28:26 -03:00
firest 7d2c336ab7 fix(resource): make sure resource will not crash when stopping 2023-04-23 15:31:08 +08:00
Serge Tupchii 423a30fbb3 fix(emqx_alarm): add safe call API to activate/deactivate alarms and use it in resource_manager
Don't let 'emqx_resource_manager' crash because of emqx_alarm timeouts.

Fixes: EMQX-9529/#10357
2023-04-20 17:15:13 +03:00
Serge Tupchii b5eda9f0d1 perf(emqx_resource): don't reactivate alarms on reoccurring errors
Avoid unnecessary calls to activate an alarm if it has been already activated.

Fixes: EMQX-9529/#10357
2023-04-20 16:37:33 +03:00
Thales Macedo Garitezi cb995e2033 fix(buffer_worker): avoid sending late reply messages to callers
Fixes https://emqx.atlassian.net/browse/EMQX-9635

During a sync call from process `A` to a buffer worker `B`, its call
to the underlying resource `C` can be very slow.  In those cases, `A`
will receive a timeout response and expect no more messages from `B`
nor `C`.  However, prior to this fix, if `B` is stuck in a long sync
call to `C` and then gets its response after `A` timed out, `B` would
still send the late response to `A`, polluting its mailbox.
2023-04-19 18:27:10 -03:00
Ivan Dyachkov dc78ecb41c chore: merge upstream/master 2023-04-18 17:33:32 +02:00
Andrew Mayorov 21e19a33ce
feat(respool): switch to `emqx_resource_pool`
Which was previously known as `emqx_plugin_libs_pool`. This is part
of the effort to get rid of `emqx_plugin_libs` application.
2023-04-18 12:51:14 +03:00
Ivan Dyachkov 9fc8a498f8 chore: bump apps versions 2023-04-17 09:09:08 +02:00
Stefan Strigler 7df0493312
Merge pull request #10390 from sstrigler/EMQX-9549-new-emqx-utils-app-to-collect-utility-modules
New emqx_utils app to collect utility modules
2023-04-14 20:33:11 +02:00
Thales Macedo Garitezi e073bc90bc refactor(buffer_worker): rename `s/queue/buffer/g` 2023-04-14 11:37:19 -03:00
Thales Macedo Garitezi 14ed4a7ada feat(buffer_worker): set default queue mode to `memory_only`
Fixes https://emqx.atlassian.net/browse/EMQX-9367

For better user experience and performance for the average bridge, we
should change the default queue mode to `memory_only`, as was the
behavior of most bridges in e4.x.  This leads to better performance
when message rate is high enough and the remote resource is not
keeping up with EMQX.

Also, we set the default segment size to equal max queue bytes.
2023-04-14 11:37:19 -03:00
Thales Macedo Garitezi 4de13d2800 feat(buffer_worker): change default max queue bytes to 256 MB 2023-04-14 09:31:33 -03:00
Stefan Strigler 9c11bfce80 refactor: rename emqx_misc to emqx_utils 2023-04-14 13:41:27 +02:00
Andrew Mayorov 5e92ba6fa9
Merge pull request #10359 from ft/EMQX-9136/no-ask-metrics
feat(resource): ask for metrics only when needed
2023-04-14 12:28:52 +03:00
Ivan Dyachkov bdffa925db chore: merge upstream/master release-50 2023-04-12 15:30:20 +02:00
Andrew Mayorov 9c9f39d0f7
feat(resman): also move out metrics collection for debugging
Now `emqx_resource:list_instances_verbose/0` will populate the metrics
for each instance, for the sake of simplicity.
2023-04-12 16:14:42 +03:00
Andrew Mayorov e70deae1c3
feat(resource): ask for metrics only when needed 2023-04-11 12:00:19 +03:00
Zaiming (Stone) Shi a9bf633e03
Merge pull request #10320 from zmstone/0403-sync-release-50-back-to-master
0403 sync release 50 back to master
2023-04-04 23:31:24 +02:00
Zaiming (Stone) Shi 68c15ffd48 Merge remote-tracking branch 'origin/release-50' into 0403-sync-release-50-back-to-master 2023-04-04 16:42:58 +02:00
Thales Macedo Garitezi 0b6fd7fe14 fix(buffer_worker): check request timeout and health check interval
Port of https://github.com/emqx/emqx/pull/10154 for `release-50`

Fixes https://emqx.atlassian.net/browse/EMQX-9099

Originally, the `resume_interval`, which is what defines how often a
buffer worker will attempt to retry its inflight window, was set to
the same as the `health_check_interval`.  This had the problem that,
with default values, `health_check_interval = request_timeout`.  This
meant that, if a buffer worker with those configs were ever blocked,
all requests would have timed out by the time it retried them.

Here we change the default `resume_interval` to a reasonable value
dependent on `health_check_interval` and `request_timeout`, and also
expose that as a hidden parameter for fine tuning if necessary.
2023-04-04 08:58:36 -03:00
Thales Macedo Garitezi f3ffc02bff feat(bridges): enable async query mode for all bridges with buffer workers
Fixes https://emqx.atlassian.net/browse/EMQX-9130

Since buffer workers always support async calls ("outer calls"), we
should decouple those two call modes (inner and outer), and avoid
exposing the inner call configuration to user to avoid complexity.

For bridges that currently only allow sync query modes, we should
allow them to be configured with async.  That means basically all
bridge types except Kafka Producer.
2023-04-03 14:49:51 -03:00
Zaiming (Stone) Shi 36000abf51 refactor: relocate i18n files for apps/emqx 2023-04-03 13:12:24 +02:00
zhongwencool d63680cf25
Merge pull request #10307 from emqx/release-50
Sync release-50 back to master
2023-04-02 11:36:41 +08:00
Thales Macedo Garitezi 246a792965
Merge pull request #10273 from thalesmg/refactor-kprod-start-error-msg-rv50
fix: return friendly message when kafka producer and consumer fails to start (rv5.0)
2023-03-31 16:25:26 -03:00
Thales Macedo Garitezi 5011486b18 fix(kafka_consumer): return better error messages when probing kafka consumer bridge
Fixes https://emqx.atlassian.net/browse/EMQX-9422
2023-03-31 11:33:15 -03:00
Zaiming (Stone) Shi bcde52383b docs: fix max batch size desc 2023-03-31 12:35:27 +02:00
Thales Macedo Garitezi 632bffd451 fix: return friendly message when kafka producer fails to start (rv5.0)
Fixes https://emqx.atlassian.net/browse/EMQX-9392

The returned information does not allow to diagnose the issue (i.e.: a
connection issue due to the wrong host and port, the wrong password
failing authn).  However, such information is printed to the logs.

This changes the returned error to the API so that the user is hinted
at looking at the logs for further investigation of the error.
2023-03-30 11:51:36 -03:00
Kjell Winblad 8e0d315b7b
Merge pull request #10197 from kjellwinblad/0321-fix-inflight-window-hand-over-to-kjell
fix: add inflight window setting to the clickhouse bridge
2023-03-29 09:38:24 +02:00
Zaiming (Stone) Shi d07987288a chore: add some example annotations for config importance level 2023-03-28 14:29:24 +02:00
Zaiming (Stone) Shi dd996ad1dc chore: bump app vsns 2023-03-24 21:47:15 +01:00
Thales Macedo Garitezi ff272a2071
Merge pull request #10206 from thalesmg/decouple-buffer-worker-query-call-mode-v50
feat(buffer_worker): decouple query mode from underlying connector call mode
2023-03-24 13:49:00 -03:00
Thales Macedo Garitezi f8d5d53908 feat(buffer_worker): decouple query mode from underlying connector call mode
Fixes https://emqx.atlassian.net/browse/EMQX-9129

Currently, if an user configures a bridge with query mode sync, then
all calls to the underlying driver/connector ("inner calls") will
always be synchronous, regardless of its support for async calls.

Since buffer workers always support async queries ("outer calls"), we
should decouple those two call modes (inner and outer), and avoid
exposing the inner call configuration to user to avoid complexity.

There are two situations when we want to force synchronous calls to
the underlying connector even if it supports async:

1) When using `simple_sync_query`, since we are bypassing the buffer
workers;
2) When retrying the inflight window, to avoid overwhelming the
driver.
2023-03-23 13:40:31 -03:00
Kjell Winblad 35474578ca refactor: rename async_inflight_window to inflight_window everywhere 2023-03-23 14:21:57 +01:00
Kjell Winblad 9d3f369cca
docs: fix spelling mistake
Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>
2023-03-23 14:09:57 +01:00
Thales Macedo Garitezi ddffba0355
Merge pull request #10154 from thalesmg/fix-buffer-worker-default-req-timeout
fix(buffer_worker): calculate default `resume_interval` based on `request_timeout` and `health_check_interval`
2023-03-22 20:21:04 -03:00