yuanbiao/emqx - emqx

Commit Graph

Author	SHA1	Message	Date
Zaiming (Stone) Shi	a9bf633e03	Merge pull request #10320 from zmstone/0403-sync-release-50-back-to-master 0403 sync release 50 back to master	2023-04-04 23:31:24 +02:00
Zaiming (Stone) Shi	68c15ffd48	Merge remote-tracking branch 'origin/release-50' into 0403-sync-release-50-back-to-master	2023-04-04 16:42:58 +02:00
Thales Macedo Garitezi	0b6fd7fe14	fix(buffer_worker): check request timeout and health check interval Port of https://github.com/emqx/emqx/pull/10154 for `release-50` Fixes https://emqx.atlassian.net/browse/EMQX-9099 Originally, the `resume_interval`, which is what defines how often a buffer worker will attempt to retry its inflight window, was set to the same as the `health_check_interval`. This had the problem that, with default values, `health_check_interval = request_timeout`. This meant that, if a buffer worker with those configs were ever blocked, all requests would have timed out by the time it retried them. Here we change the default `resume_interval` to a reasonable value dependent on `health_check_interval` and `request_timeout`, and also expose that as a hidden parameter for fine tuning if necessary.	2023-04-04 08:58:36 -03:00
Thales Macedo Garitezi	f3ffc02bff	feat(bridges): enable async query mode for all bridges with buffer workers Fixes https://emqx.atlassian.net/browse/EMQX-9130 Since buffer workers always support async calls ("outer calls"), we should decouple those two call modes (inner and outer), and avoid exposing the inner call configuration to user to avoid complexity. For bridges that currently only allow sync query modes, we should allow them to be configured with async. That means basically all bridge types except Kafka Producer.	2023-04-03 14:49:51 -03:00
Zaiming (Stone) Shi	36000abf51	refactor: relocate i18n files for apps/emqx	2023-04-03 13:12:24 +02:00
zhongwencool	d63680cf25	Merge pull request #10307 from emqx/release-50 Sync release-50 back to master	2023-04-02 11:36:41 +08:00
Thales Macedo Garitezi	246a792965	Merge pull request #10273 from thalesmg/refactor-kprod-start-error-msg-rv50 fix: return friendly message when kafka producer and consumer fails to start (rv5.0)	2023-03-31 16:25:26 -03:00
Thales Macedo Garitezi	5011486b18	fix(kafka_consumer): return better error messages when probing kafka consumer bridge Fixes https://emqx.atlassian.net/browse/EMQX-9422	2023-03-31 11:33:15 -03:00
Zaiming (Stone) Shi	bcde52383b	docs: fix max batch size desc	2023-03-31 12:35:27 +02:00
Thales Macedo Garitezi	632bffd451	fix: return friendly message when kafka producer fails to start (rv5.0) Fixes https://emqx.atlassian.net/browse/EMQX-9392 The returned information does not allow to diagnose the issue (i.e.: a connection issue due to the wrong host and port, the wrong password failing authn). However, such information is printed to the logs. This changes the returned error to the API so that the user is hinted at looking at the logs for further investigation of the error.	2023-03-30 11:51:36 -03:00
Kjell Winblad	8e0d315b7b	Merge pull request #10197 from kjellwinblad/0321-fix-inflight-window-hand-over-to-kjell fix: add inflight window setting to the clickhouse bridge	2023-03-29 09:38:24 +02:00
Zaiming (Stone) Shi	d07987288a	chore: add some example annotations for config importance level	2023-03-28 14:29:24 +02:00
Zaiming (Stone) Shi	dd996ad1dc	chore: bump app vsns	2023-03-24 21:47:15 +01:00
Thales Macedo Garitezi	ff272a2071	Merge pull request #10206 from thalesmg/decouple-buffer-worker-query-call-mode-v50 feat(buffer_worker): decouple query mode from underlying connector call mode	2023-03-24 13:49:00 -03:00
Thales Macedo Garitezi	f8d5d53908	feat(buffer_worker): decouple query mode from underlying connector call mode Fixes https://emqx.atlassian.net/browse/EMQX-9129 Currently, if an user configures a bridge with query mode sync, then all calls to the underlying driver/connector ("inner calls") will always be synchronous, regardless of its support for async calls. Since buffer workers always support async queries ("outer calls"), we should decouple those two call modes (inner and outer), and avoid exposing the inner call configuration to user to avoid complexity. There are two situations when we want to force synchronous calls to the underlying connector even if it supports async: 1) When using `simple_sync_query`, since we are bypassing the buffer workers; 2) When retrying the inflight window, to avoid overwhelming the driver.	2023-03-23 13:40:31 -03:00
Kjell Winblad	35474578ca	refactor: rename async_inflight_window to inflight_window everywhere	2023-03-23 14:21:57 +01:00
Kjell Winblad	9d3f369cca	docs: fix spelling mistake Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>	2023-03-23 14:09:57 +01:00
Thales Macedo Garitezi	ddffba0355	Merge pull request #10154 from thalesmg/fix-buffer-worker-default-req-timeout fix(buffer_worker): calculate default `resume_interval` based on `request_timeout` and `health_check_interval`	2023-03-22 20:21:04 -03:00
Thales Macedo Garitezi	8844b22c80	docs: improve descriptions Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-03-22 15:32:09 -03:00
Thales Macedo Garitezi	127a075b66	test(dynamo): attempt to fix dynamo tests Those tests in the `flaky` test are really flaky and require lots of CI retries. Apparently, the flakiness comes from race conditions from restarting bridges with the same name too fast between test cases. Previously, all test cases were sharing the same bridge name (the module name).	2023-03-22 14:34:37 -03:00
Thales Macedo Garitezi	61cb03b45a	fix(buffer_worker): change the default `resume_interval` value and expose it as hidden config Also removes the previously added alarm for request timeout. There are situations where having a short request timeout and a long health check interval make sense, so we don't want to alarm the user for those situations. Instead, we automatically attempt to set a reasonable `resume_interval` value.	2023-03-22 11:47:36 -03:00
Kjell Winblad	27b8445337	fix: add inflight window setting to the clickhouse bridge This commit makes sure the inflight window setting is present for the clickhouse bridge. It also changes emqx_resource_schema that previously removed the inflight window setting from resources with query mode `always_sync`. We don't need to do that because all bridges that uses the buffer worker queue will get async call handling even if the bridge don't support the async callback. Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-03-21 17:14:03 +01:00
Stefan Strigler	c1384b6e6e	feat(emqx_resource): include error with alarm for resource_down	2023-03-21 15:02:29 +01:00
Stefan Strigler	53825b9aba	fix(emqx_bridge): propagate connection error to resource status	2023-03-21 15:02:29 +01:00
Thales Macedo Garitezi	20414d7373	fix(buffer_worker): check request timeout and health check interval Fixes https://emqx.atlassian.net/browse/EMQX-9099 The default value for `request_timeout` is 15 seconds, and the default resume interval is also 15 seconds (the health check timeout, if `resume_interval` is not explicitly given). This means that, in practice, if a buffer worker ever gets into the blocked state, then almost all requests will timeout. Proposed improvement: - `request_timeout` should by default be twice as much as health_check_interval. - Emit a alarm if `request_timeout` is not greater than `health_check_interval`.	2023-03-16 13:46:45 -03:00
Thales Macedo Garitezi	d464e2aad5	refactor: rename test resource prefix Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi	03342923b9	fix(bridge): use the same dry run prefix Kafka Producer and Consumer bridges rely on this prefix for detecting a dry run and avoid leaking atoms. At some point, this prefix was changed, effectively disabling the check in Kafka Producer.	2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi	91a57faa95	Merge pull request #10128 from thalesmg/ocsp-v50-mkII feat: add ocsp stapling support to mqtt ssl listener (5.0)	2023-03-16 13:10:48 -03:00
Thales Macedo Garitezi	164440fe83	test(resource): fix flaky test Sometimes this test might retry more times, so we check the prefix of the trace only.	2023-03-15 14:25:55 -03:00
Andrew Mayorov	a9bc8a4464	refactor(resman): rename `ets_lookup` → `lookup_cached` That way we hide the impementation details + the interface becomes cleaner and more obvious.	2023-03-15 19:17:30 +03:00
Andrew Mayorov	29907875bf	test(bufworker): set `batch_time` for batch-related testcases By default it's `0` since `e9d3fc51`. This made a couple of tests prone to flapping.	2023-03-15 19:17:30 +03:00
Andrew Mayorov	e411c5d5f8	refactor(resman): work with state cache atomically Also ensure that cache entries are always consistent with `Data`, so that most of the code could rely on reading the cached entry most of the time.	2023-03-15 19:17:30 +03:00
Andrew Mayorov	cad6492c99	perf(bridge-api): ask bridge listings in parallel Also rename response formatting functions to better clarify their purpose.	2023-03-15 19:17:29 +03:00
Thales Macedo Garitezi	422597a441	test: fix flaky tests	2023-03-14 16:08:47 -03:00
Andrew Mayorov	686bf8255b	fix(bridge): reply `emqx_resource:get_instance/1` from cache The resource manager may be busy at times, so this change ensures that getting resource instance state will not block. Currently, no users of `emqx_resource:get_instance/1` do seem to be relying on state being "as-actual-as-possible" guarantee it was providing.	2023-03-13 14:35:08 +03:00
Andrew Mayorov	a86d06f043	chore: bump app versions following last merge-back	2023-03-10 16:44:15 +03:00
Zaiming (Stone) Shi	fe27604010	Merge remote-tracking branch 'origin/release-50' into 0308-merge-release-50-back-to-master	2023-03-08 16:46:45 +01:00
Thales Macedo Garitezi	eef65fba60	fix(buffer_worker): handle `request_timeout = infinity` case The current schema allows `infinity` for `request_timeout`, so we have to take that into account. It's not currently possible to set `batch_time = infinity`, so there's no need to treat that case.	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	18ab7ed197	chore: bump app vsns	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	0e707e837f	docs(buffer_worker): improve description of `request_timeout`	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	e9d3fc511f	chore(buffer_worker): change default `batch_time` to 0 and improve docs	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	f95a30ae89	fix(webhook): convert `request_timeout`s in root and resource_opts	2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi	167b7a212f	refactor(buffer_worker): avoid starting 0-time timers	2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi	e9ffabf936	fix(buffer_worker): add batch time automatic adjustment To avoid message loss due to misconfigurations, we adjust `batch_time` based on `request_timeout`. If `batch_time` > `request_timeout`, all requests will timeout before being sent if the message rate is low. Even worse if `pool_size` is high. We cap `batch_time` at `request_timeout div 2` as a rule of thumb.	2023-03-06 10:12:38 -03:00
Kjell Winblad	67acdf0888	feat: add clickhouse database bridge This commit adds a Clickhouse bridge to EMQX 5. The bridge is similar to the Clickhouse bridge in the 4.4, but adds the possibility to use different formats (such as JSON) for values to be inserted.	2023-03-02 12:22:11 +01:00
Andrew Mayorov	c883e4b36a	test: drop custom `loop_wait` in favor of snabkaffe's `?retry`	2023-02-24 18:16:35 +03:00
Andrew Mayorov	2b4e49e7df	fix(bufworker): handle replies of simple async queries Before that change, simple queries were treated as "retries" essentially, thus skipping all the reply processing there is.	2023-02-24 15:06:49 +03:00
Zaiming (Stone) Shi	c97d17cc91	test: refactor to loop wait for counters	2023-02-24 09:02:03 +01:00
Zaiming (Stone) Shi	a10dbba084	refactor(buffer_worker): less defensive on inflight counter decrement	2023-02-23 21:23:10 +01:00
Zaiming (Stone) Shi	7a6465e2cf	fix(buffer_worker): ensure flush timer reset in blocked state	2023-02-23 21:06:38 +01:00

1 2 3 4 5 ...

390 Commits