Commit Graph

390 Commits

Author SHA1 Message Date
Zaiming (Stone) Shi a9bf633e03
Merge pull request #10320 from zmstone/0403-sync-release-50-back-to-master
0403 sync release 50 back to master
2023-04-04 23:31:24 +02:00
Zaiming (Stone) Shi 68c15ffd48 Merge remote-tracking branch 'origin/release-50' into 0403-sync-release-50-back-to-master 2023-04-04 16:42:58 +02:00
Thales Macedo Garitezi 0b6fd7fe14 fix(buffer_worker): check request timeout and health check interval
Port of https://github.com/emqx/emqx/pull/10154 for `release-50`

Fixes https://emqx.atlassian.net/browse/EMQX-9099

Originally, the `resume_interval`, which is what defines how often a
buffer worker will attempt to retry its inflight window, was set to
the same as the `health_check_interval`.  This had the problem that,
with default values, `health_check_interval = request_timeout`.  This
meant that, if a buffer worker with those configs were ever blocked,
all requests would have timed out by the time it retried them.

Here we change the default `resume_interval` to a reasonable value
dependent on `health_check_interval` and `request_timeout`, and also
expose that as a hidden parameter for fine tuning if necessary.
2023-04-04 08:58:36 -03:00
Thales Macedo Garitezi f3ffc02bff feat(bridges): enable async query mode for all bridges with buffer workers
Fixes https://emqx.atlassian.net/browse/EMQX-9130

Since buffer workers always support async calls ("outer calls"), we
should decouple those two call modes (inner and outer), and avoid
exposing the inner call configuration to user to avoid complexity.

For bridges that currently only allow sync query modes, we should
allow them to be configured with async.  That means basically all
bridge types except Kafka Producer.
2023-04-03 14:49:51 -03:00
Zaiming (Stone) Shi 36000abf51 refactor: relocate i18n files for apps/emqx 2023-04-03 13:12:24 +02:00
zhongwencool d63680cf25
Merge pull request #10307 from emqx/release-50
Sync release-50 back to master
2023-04-02 11:36:41 +08:00
Thales Macedo Garitezi 246a792965
Merge pull request #10273 from thalesmg/refactor-kprod-start-error-msg-rv50
fix: return friendly message when kafka producer and consumer fails to start (rv5.0)
2023-03-31 16:25:26 -03:00
Thales Macedo Garitezi 5011486b18 fix(kafka_consumer): return better error messages when probing kafka consumer bridge
Fixes https://emqx.atlassian.net/browse/EMQX-9422
2023-03-31 11:33:15 -03:00
Zaiming (Stone) Shi bcde52383b docs: fix max batch size desc 2023-03-31 12:35:27 +02:00
Thales Macedo Garitezi 632bffd451 fix: return friendly message when kafka producer fails to start (rv5.0)
Fixes https://emqx.atlassian.net/browse/EMQX-9392

The returned information does not allow to diagnose the issue (i.e.: a
connection issue due to the wrong host and port, the wrong password
failing authn).  However, such information is printed to the logs.

This changes the returned error to the API so that the user is hinted
at looking at the logs for further investigation of the error.
2023-03-30 11:51:36 -03:00
Kjell Winblad 8e0d315b7b
Merge pull request #10197 from kjellwinblad/0321-fix-inflight-window-hand-over-to-kjell
fix: add inflight window setting to the clickhouse bridge
2023-03-29 09:38:24 +02:00
Zaiming (Stone) Shi d07987288a chore: add some example annotations for config importance level 2023-03-28 14:29:24 +02:00
Zaiming (Stone) Shi dd996ad1dc chore: bump app vsns 2023-03-24 21:47:15 +01:00
Thales Macedo Garitezi ff272a2071
Merge pull request #10206 from thalesmg/decouple-buffer-worker-query-call-mode-v50
feat(buffer_worker): decouple query mode from underlying connector call mode
2023-03-24 13:49:00 -03:00
Thales Macedo Garitezi f8d5d53908 feat(buffer_worker): decouple query mode from underlying connector call mode
Fixes https://emqx.atlassian.net/browse/EMQX-9129

Currently, if an user configures a bridge with query mode sync, then
all calls to the underlying driver/connector ("inner calls") will
always be synchronous, regardless of its support for async calls.

Since buffer workers always support async queries ("outer calls"), we
should decouple those two call modes (inner and outer), and avoid
exposing the inner call configuration to user to avoid complexity.

There are two situations when we want to force synchronous calls to
the underlying connector even if it supports async:

1) When using `simple_sync_query`, since we are bypassing the buffer
workers;
2) When retrying the inflight window, to avoid overwhelming the
driver.
2023-03-23 13:40:31 -03:00
Kjell Winblad 35474578ca refactor: rename async_inflight_window to inflight_window everywhere 2023-03-23 14:21:57 +01:00
Kjell Winblad 9d3f369cca
docs: fix spelling mistake
Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>
2023-03-23 14:09:57 +01:00
Thales Macedo Garitezi ddffba0355
Merge pull request #10154 from thalesmg/fix-buffer-worker-default-req-timeout
fix(buffer_worker): calculate default `resume_interval` based on `request_timeout` and `health_check_interval`
2023-03-22 20:21:04 -03:00
Thales Macedo Garitezi 8844b22c80
docs: improve descriptions
Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>
2023-03-22 15:32:09 -03:00
Thales Macedo Garitezi 127a075b66 test(dynamo): attempt to fix dynamo tests
Those tests in the `flaky` test are really flaky and require lots of
CI retries.

Apparently, the flakiness comes from race conditions from restarting
bridges with the same name too fast between test cases.  Previously,
all test cases were sharing the same bridge name (the module name).
2023-03-22 14:34:37 -03:00
Thales Macedo Garitezi 61cb03b45a fix(buffer_worker): change the default `resume_interval` value and expose it as hidden config
Also removes the previously added alarm for request timeout.

There are situations where having a short request timeout and a long
health check interval make sense, so we don't want to alarm the user
for those situations.  Instead, we automatically attempt to set a
reasonable `resume_interval` value.
2023-03-22 11:47:36 -03:00
Kjell Winblad 27b8445337 fix: add inflight window setting to the clickhouse bridge
This commit makes sure the inflight window setting is present for the
clickhouse bridge. It also changes emqx_resource_schema that previously
removed the inflight window setting from resources with query mode
`always_sync`. We don't need to do that because all bridges that uses
the buffer worker queue will get async call handling even if the bridge
don't support the async callback.

Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>
2023-03-21 17:14:03 +01:00
Stefan Strigler c1384b6e6e feat(emqx_resource): include error with alarm for resource_down 2023-03-21 15:02:29 +01:00
Stefan Strigler 53825b9aba fix(emqx_bridge): propagate connection error to resource status 2023-03-21 15:02:29 +01:00
Thales Macedo Garitezi 20414d7373 fix(buffer_worker): check request timeout and health check interval
Fixes https://emqx.atlassian.net/browse/EMQX-9099

The default value for `request_timeout` is 15 seconds, and the default
resume interval is also 15 seconds (the health check timeout, if
`resume_interval` is not explicitly given).  This means that, in
practice, if a buffer worker ever gets into the blocked state, then
almost all requests will timeout.

Proposed improvement:

- `request_timeout` should by default be twice as much as
health_check_interval.
- Emit a alarm if `request_timeout` is not greater than
`health_check_interval`.
2023-03-16 13:46:45 -03:00
Thales Macedo Garitezi d464e2aad5 refactor: rename test resource prefix
Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>
2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi 03342923b9 fix(bridge): use the same dry run prefix
Kafka Producer and Consumer bridges rely on this prefix for detecting
a dry run and avoid leaking atoms.  At some point, this prefix was
changed, effectively disabling the check in Kafka Producer.
2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi 91a57faa95
Merge pull request #10128 from thalesmg/ocsp-v50-mkII
feat: add ocsp stapling support to mqtt ssl listener (5.0)
2023-03-16 13:10:48 -03:00
Thales Macedo Garitezi 164440fe83 test(resource): fix flaky test
Sometimes this test might retry more times, so we check the prefix
of the trace only.
2023-03-15 14:25:55 -03:00
Andrew Mayorov a9bc8a4464
refactor(resman): rename `ets_lookup` → `lookup_cached`
That way we hide the impementation details + the interface becomes
cleaner and more obvious.
2023-03-15 19:17:30 +03:00
Andrew Mayorov 29907875bf
test(bufworker): set `batch_time` for batch-related testcases
By default it's `0` since e9d3fc51. This made a couple of tests prone
to flapping.
2023-03-15 19:17:30 +03:00
Andrew Mayorov e411c5d5f8
refactor(resman): work with state cache atomically
Also ensure that cache entries are always consistent with `Data`,
so that most of the code could rely on reading the cached entry
most of the time.
2023-03-15 19:17:30 +03:00
Andrew Mayorov cad6492c99
perf(bridge-api): ask bridge listings in parallel
Also rename response formatting functions to better clarify their
purpose.
2023-03-15 19:17:29 +03:00
Thales Macedo Garitezi 422597a441 test: fix flaky tests 2023-03-14 16:08:47 -03:00
Andrew Mayorov 686bf8255b
fix(bridge): reply `emqx_resource:get_instance/1` from cache
The resource manager may be busy at times, so this change ensures that
getting resource instance state will not block. Currently, no users of
`emqx_resource:get_instance/1` do seem to be relying on state being
"as-actual-as-possible" guarantee it was providing.
2023-03-13 14:35:08 +03:00
Andrew Mayorov a86d06f043
chore: bump app versions following last merge-back 2023-03-10 16:44:15 +03:00
Zaiming (Stone) Shi fe27604010 Merge remote-tracking branch 'origin/release-50' into 0308-merge-release-50-back-to-master 2023-03-08 16:46:45 +01:00
Thales Macedo Garitezi eef65fba60 fix(buffer_worker): handle `request_timeout = infinity` case
The current schema allows `infinity` for `request_timeout`, so we have
to take that into account.  It's not currently possible to set
`batch_time = infinity`, so there's no need to treat that case.
2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi 18ab7ed197 chore: bump app vsns 2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi 0e707e837f docs(buffer_worker): improve description of `request_timeout` 2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi e9d3fc511f chore(buffer_worker): change default `batch_time` to 0 and improve docs 2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi f95a30ae89 fix(webhook): convert `request_timeout`s in root and resource_opts 2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi 167b7a212f refactor(buffer_worker): avoid starting 0-time timers 2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi e9ffabf936 fix(buffer_worker): add batch time automatic adjustment
To avoid message loss due to misconfigurations, we adjust `batch_time`
based on `request_timeout`.  If `batch_time` > `request_timeout`, all
requests will timeout before being sent if the message rate is low.
Even worse if `pool_size` is high.  We cap `batch_time` at
`request_timeout div 2` as a rule of thumb.
2023-03-06 10:12:38 -03:00
Kjell Winblad 67acdf0888 feat: add clickhouse database bridge
This commit adds a Clickhouse bridge to EMQX 5. The bridge is similar to
the Clickhouse bridge in the 4.4, but adds the possibility to use
different formats (such as JSON) for values to be inserted.
2023-03-02 12:22:11 +01:00
Andrew Mayorov c883e4b36a
test: drop custom `loop_wait` in favor of snabkaffe's `?retry` 2023-02-24 18:16:35 +03:00
Andrew Mayorov 2b4e49e7df
fix(bufworker): handle replies of simple async queries
Before that change, simple queries were treated as "retries"
essentially, thus skipping all the reply processing there is.
2023-02-24 15:06:49 +03:00
Zaiming (Stone) Shi c97d17cc91 test: refactor to loop wait for counters 2023-02-24 09:02:03 +01:00
Zaiming (Stone) Shi a10dbba084 refactor(buffer_worker): less defensive on inflight counter decrement 2023-02-23 21:23:10 +01:00
Zaiming (Stone) Shi 7a6465e2cf fix(buffer_worker): ensure flush timer reset in blocked state 2023-02-23 21:06:38 +01:00