yuanbiao/emqx - emqx

Commit Graph

Author	SHA1	Message	Date
Thales Macedo Garitezi	6be8ff378e	fix(buffer_worker): make buffer worker enter `blocked` state when async worker dies Fixes https://emqx.atlassian.net/browse/EMQX-10074 Otherwise, requests from those async workers, now retriable, might not be retried until the buffer worker blocks for other reasons, which might take a long time.	2023-05-30 15:34:22 -03:00
Andrew Mayorov	a5fc26736d	refactor(mqttconn): split ingress/egress into 2 separate pools Each with a more refined set of responsibilities, at the cost of slight code duplication. Also provide two different config fields for each pool size.	2023-05-30 17:21:44 +03:00
Thales Macedo Garitezi	75fcac9711	Merge pull request #10826 from thalesmg/test-partial-batch-expired-inflight-v50 test(buffer_worker): add assertion for inflight count after batch expiration	2023-05-30 09:05:59 -03:00
Zaiming (Stone) Shi	91cdc69976	Merge pull request #10867 from zmstone/0530-merge-release-50-to-master 0530 merge release 50 to master	2023-05-30 09:54:57 +02:00
Zaiming (Stone) Shi	9529919046	chore: bump app versions	2023-05-30 08:08:29 +02:00
Thales Macedo Garitezi	67e182e0c9	Merge pull request #10813 from thalesmg/refactor-kafka-on-stop-v50 feat(kafka): ensure allocated resources are removed on failures	2023-05-29 16:49:29 -03:00
Zaiming (Stone) Shi	36e268c933	chore: bump app versions	2023-05-26 16:05:37 +02:00
Zaiming (Stone) Shi	cc5b4d3748	Merge remote-tracking branch 'origin/release-50' into 0526-ci-delete-otp-24-from-standalone-app-test	2023-05-26 15:58:16 +02:00
Thales Macedo Garitezi	32e6213ce3	fix(resource_manager_sup): use `one_for_one` instead of `simple_one_for_one` Using `simple_one_for_one` has a potential race condition issue where we read the PID of the resource manager before trying to remove a resource, and then that PID changes because it was either dead at first, or it crashed and changed, and later we use this stale PID to try to remove it from the supervisor. Under such circumstances, the restarting child might linger in the supervisor, leaking resources. By using the resource ID itself as a child ID (and using `one_for_one` restart strategy), we ensure the child is truly removed.	2023-05-25 18:07:43 -03:00
Thales Macedo Garitezi	42b37690c7	refactor(pulsar): use macros for allocatable resources	2023-05-25 16:38:09 -03:00
Thales Macedo Garitezi	db60dcbada	test(buffer_worker): add assertion for inflight count after batch expiration Fixes https://emqx.atlassian.net/browse/EMQX-9829	2023-05-25 16:11:37 -03:00
Thales Macedo Garitezi	18d57ba3eb	Merge pull request #10812 from thalesmg/test-flakiness-20230524 test: attempts to reduce flakiness (pgsql, cassandra)	2023-05-25 09:29:13 -03:00
JianBo He	de7f1c8aec	test: add tests for auto_restart_interval	2023-05-25 17:15:19 +08:00
JianBo He	71b636e321	fix: fix auto_restart_interval checker	2023-05-25 12:04:23 +08:00
Paulo Zulato	122ebcac24	fix: add user-friendly message when interval is out of range	2023-05-24 15:46:00 -03:00
Thales Macedo Garitezi	7f88521836	test(pgsql): reduce flakiness Depending on timing, `t_write_timeout` was getting stuck while checking the resource health, and the previous request timeout options were making a response to never be sent if that process took too long.	2023-05-24 15:41:25 -03:00
Thales Macedo Garitezi	fd2940cd77	feat(pulsar): ensure allocated resources are removed on failures (v5.0) Fixes https://emqx.atlassian.net/browse/EMQX-9937	2023-05-24 12:29:00 -03:00
Zaiming (Stone) Shi	732a7be187	Merge remote-tracking branch 'origin/release-50'	2023-05-22 17:46:54 +02:00
Thales Macedo Garitezi	0559d6f639	refactor(buffer_worker): use static fn for bumping counters	2023-05-22 09:12:08 -03:00
Thales Macedo Garitezi	c74c93388e	refactor: rename some variables and sum type constructors for clarity	2023-05-22 09:11:23 -03:00
Thales Macedo Garitezi	7d798c10e9	perf(buffer_worker): flush metrics periodically inside buffer worker process Fixes https://emqx.atlassian.net/browse/EMQX-9905 Since calling `telemetry` is costly in a hot path, we instead collect metrics inside the buffer workers state and periodically flush them, rather than immediately as events happen.	2023-05-22 09:11:23 -03:00
Andrew Mayorov	ba6b208df2	fix(clickhouse): start app in tests Otherwise, depending on the test execution order, tests might sometimes fail. Moreover, ensure that applications describe their dependecies correctly and avoid starting irrelevant apps in tests.	2023-05-19 23:08:40 +03:00
Zaiming (Stone) Shi	cb76e5a241	docs: add changelog for 10755	2023-05-19 20:41:26 +02:00
Zaiming (Stone) Shi	0d8ffc0d59	fix(resource-manager): ensure no false creation Update is implemented as remove + create. If a dleete call is made while the create is in progress the remove call is likely to timeout too. This causes the follwing creation to falsely succeed, because there is alreay a running child under the supervisor. As a result, the resource is permanently removed after resource_manager eventually handles the remove call.	2023-05-19 18:55:16 +02:00
Zaiming (Stone) Shi	f5e5c59763	refactor(resource-manager-sup): do not force kill resource manager the shutdown timeout is now set to infinity so it will never force kill a resource manager, otherwise there will be resource leaks	2023-05-19 18:55:16 +02:00
Zaiming (Stone) Shi	21de0f8274	fix(buffer-worker-sup): fast stop the timeout shutdown in child spec may significantly slow down the deletion of a resource this commit chagnes the shutdown to brutal kill also, the pool worker removal code has been delete because it's not necessary since the entier pool is going to be force-delete later anyway	2023-05-19 18:55:16 +02:00
firest	baeb96a6e4	chore: update changes	2023-05-19 15:36:18 +08:00
firest	0eea8438bf	fix(resource): make some logging of the resource manager more secure	2023-05-19 15:28:19 +08:00
Paulo Zulato	5d289ade56	fix: validate range for some bridge options Fixes https://emqx.atlassian.net/browse/EMQX-9864 Setting a very large interval can cause `erlang:start_timer` to crash. Also, setting auto_restart_interval or health_check_interval to "0s" causes the state machine to be in loop as time 0 is handled separately: \| state_timeout() = timeout() \| integer() \| (...) \| If Time is relative and 0 no timer is actually started, instead the the \| time-out event is enqueued to ensure that it gets processed before any \| not yet received external event. from "https://www.erlang.org/doc/man/gen_statem.html#type-state_timeout" Therefore, both fields are now validated against the range [1ms, 1h], which doesn't cause above issues.	2023-05-18 10:10:58 -03:00
Thales Macedo Garitezi	447b76464b	Merge branch 'release-50' into merge-r50-into-v50-a	2023-05-17 14:50:18 -03:00
Thales Macedo Garitezi	85089a3210	fix(buffer_worker): correctly flush the buffer workers when inflight table room is made The previous commit uncovered another bug that was hidden by it: `maybe_flush_after_async_reply` was sending a message to the wrong PID. It was sending a message to `self()` meaning to target a buffer worker, but `self()` in that context is never the buffer worker, it's the connector's worker. This change also revealed a race condition where the buffer workers could stop flushing messages. So we piggy-backed on the atomic update of the table size count to check if the buffer worker should be poked to continue flushing. This allows us to get rid of `maybe_flush_after_async_reply` altogether.	2023-05-16 17:15:42 -03:00
Thales Macedo Garitezi	657df05ad9	fix(buffer_worker): avoid setting flush timer when inflight is full Fixes https://emqx.atlassian.net/browse/EMQX-9902 When the buffer worker inflight window is full, we don’t need to set a timer to flush the messages again because there’s no more room, and one of the inflight windows will flush the buffer worker by calling `flush_worker`. Currently, we do set the timer on such situation, and this fact combined with the default batch time of 0 yields a busy loop situation where the CPU spins a lot while inflight messages do not return.	2023-05-16 11:28:58 -03:00
zhongwencool	a953b951fe	Merge branch 'master' into sync-release-50-to-master	2023-05-12 18:01:58 +08:00
Thales Macedo Garitezi	64dc9ed46a	perf(metrics): avoid increasing counters by 0 Some performance tests indicate that calling `telemetry` is costly in hot paths. Since increasing a counter by 0 is a no-op, we should avoid calling `telemetry` if the amount to increase is 0.	2023-05-11 15:13:37 -03:00
Kjell Winblad	70cf1533db	feat: add RabbitMQ bridge	2023-05-09 14:32:26 +02:00
Zaiming (Stone) Shi	13dcb5732f	Merge remote-tracking branch 'origin/release-50' into 0508-prepare-for-e5.0.4	2023-05-08 21:29:35 +02:00
Thales Macedo Garitezi	eba627b365	fix(buffer_worker): fix inflight count when updating inflight item	2023-05-08 09:27:51 -03:00
Zhongwen Deng	4f396a36a9	Merge remote-tracking branch 'upstream/master' into release-50	2023-05-08 14:58:03 +08:00
Thales Macedo Garitezi	8aa7c014e7	perf(buffer_worker): avoid calling `ets:info/2` (Almost?) fixes https://emqx.atlassian.net/browse/EMQX-9637 During the course of performance tests comparing the performance of e5.0.3 and e4.4.16 regarding the webhook bridge in sync mode, we observed that the throughput in e5.0.3 (sync) was much lower than in e4.4.16: ~ 9 k msgs / s vs. ~ 50 k msgs / s, respectively. Analyzing `observer_cli` output, we noticed that a lot of the time both buffer workers and ehttpc processes was spent in `ets:info/2`. That function was called to check the size of the inflight table when updating metrics and checking if the inflight table was full. Other uses of `ets:info/2` were contained inside the arguments to some `?tp/2` macro usages (https://github.com/kafka4beam/snabbkaffe/pull/60). By using a specific record to track the size of the table, we managed to improve the bridge performance to ~ 45 k msgs / s in sync mode.	2023-05-02 17:05:32 -03:00
Andrew Mayorov	670709f746	feat(resource): ensure uniqueness through `gproc` Also use it instead of a custom ETS table for simplicity and better consistency. This has drawbacks though: expect slightly increased load on gproc gen_server due to how `gproc:set_value/2` works.	2023-05-02 17:29:22 +03:00
Andrew Mayorov	4575167607	feat(resource): drop `manager_id()` type	2023-05-02 17:29:20 +03:00
Andrew Mayorov	aaef95b1da	feat(resman): stop adding uniqueness to manager ids Before this change, a separate `manager_id` / `instance_id` was used as resource manager id, which made connector interface somewhat inconsistent: part of function calls to connector implementation used instance id as first argument while the rest used resource id itself.	2023-05-02 17:28:26 +03:00
Thales Macedo Garitezi	7853a4c36e	chore: bump app vsns	2023-04-27 11:58:28 -03:00
Thales Macedo Garitezi	567413389c	Merge pull request #10519 from thalesmg/fix-flaky-res-test-v50 test(resource): fix flaky test	2023-04-27 09:33:40 -03:00
Thales Macedo Garitezi	c53741a08c	fix(buffer_worker): avoid sending late reply messages to callers Fixes https://emqx.atlassian.net/browse/EMQX-9635 During a sync call from process `A` to a buffer worker `B`, its call to the underlying resource `C` can be very slow. In those cases, `A` will receive a timeout response and expect no more messages from `B` nor `C`. However, prior to this fix, if `B` is stuck in a long sync call to `C` and then gets its response after `A` timed out, `B` would still send the late response to `A`, polluting its mailbox.	2023-04-26 13:18:28 -03:00
Thales Macedo Garitezi	d78312e10e	test(resource): fix flaky test	2023-04-26 09:25:33 -03:00
zhongwencool	9d893b49eb	Merge branch 'master' into sync-release-50-to-master	2023-04-26 10:54:46 +08:00
Thales Macedo Garitezi	ad4be08bb2	feat: implement Pulsar Producer bridge (e5.0) Fixes https://emqx.atlassian.net/browse/EMQX-8398	2023-04-24 10:28:26 -03:00
firest	7d2c336ab7	fix(resource): make sure resource will not crash when stopping	2023-04-23 15:31:08 +08:00
Serge Tupchii	423a30fbb3	fix(emqx_alarm): add safe call API to activate/deactivate alarms and use it in resource_manager Don't let 'emqx_resource_manager' crash because of emqx_alarm timeouts. Fixes: EMQX-9529/#10357	2023-04-20 17:15:13 +03:00
Serge Tupchii	b5eda9f0d1	perf(emqx_resource): don't reactivate alarms on reoccurring errors Avoid unnecessary calls to activate an alarm if it has been already activated. Fixes: EMQX-9529/#10357	2023-04-20 16:37:33 +03:00
Thales Macedo Garitezi	cb995e2033	fix(buffer_worker): avoid sending late reply messages to callers Fixes https://emqx.atlassian.net/browse/EMQX-9635 During a sync call from process `A` to a buffer worker `B`, its call to the underlying resource `C` can be very slow. In those cases, `A` will receive a timeout response and expect no more messages from `B` nor `C`. However, prior to this fix, if `B` is stuck in a long sync call to `C` and then gets its response after `A` timed out, `B` would still send the late response to `A`, polluting its mailbox.	2023-04-19 18:27:10 -03:00
Ivan Dyachkov	dc78ecb41c	chore: merge upstream/master	2023-04-18 17:33:32 +02:00
Andrew Mayorov	21e19a33ce	feat(respool): switch to `emqx_resource_pool` Which was previously known as `emqx_plugin_libs_pool`. This is part of the effort to get rid of `emqx_plugin_libs` application.	2023-04-18 12:51:14 +03:00
Ivan Dyachkov	9fc8a498f8	chore: bump apps versions	2023-04-17 09:09:08 +02:00
Stefan Strigler	7df0493312	Merge pull request #10390 from sstrigler/EMQX-9549-new-emqx-utils-app-to-collect-utility-modules New emqx_utils app to collect utility modules	2023-04-14 20:33:11 +02:00
Thales Macedo Garitezi	e073bc90bc	refactor(buffer_worker): rename `s/queue/buffer/g`	2023-04-14 11:37:19 -03:00
Thales Macedo Garitezi	14ed4a7ada	feat(buffer_worker): set default queue mode to `memory_only` Fixes https://emqx.atlassian.net/browse/EMQX-9367 For better user experience and performance for the average bridge, we should change the default queue mode to `memory_only`, as was the behavior of most bridges in e4.x. This leads to better performance when message rate is high enough and the remote resource is not keeping up with EMQX. Also, we set the default segment size to equal max queue bytes.	2023-04-14 11:37:19 -03:00
Thales Macedo Garitezi	4de13d2800	feat(buffer_worker): change default max queue bytes to 256 MB	2023-04-14 09:31:33 -03:00
Stefan Strigler	9c11bfce80	refactor: rename emqx_misc to emqx_utils	2023-04-14 13:41:27 +02:00
Andrew Mayorov	5e92ba6fa9	Merge pull request #10359 from ft/EMQX-9136/no-ask-metrics feat(resource): ask for metrics only when needed	2023-04-14 12:28:52 +03:00
Ivan Dyachkov	bdffa925db	chore: merge upstream/master release-50	2023-04-12 15:30:20 +02:00
Andrew Mayorov	9c9f39d0f7	feat(resman): also move out metrics collection for debugging Now `emqx_resource:list_instances_verbose/0` will populate the metrics for each instance, for the sake of simplicity.	2023-04-12 16:14:42 +03:00
Andrew Mayorov	e70deae1c3	feat(resource): ask for metrics only when needed	2023-04-11 12:00:19 +03:00
Zaiming (Stone) Shi	a9bf633e03	Merge pull request #10320 from zmstone/0403-sync-release-50-back-to-master 0403 sync release 50 back to master	2023-04-04 23:31:24 +02:00
Zaiming (Stone) Shi	68c15ffd48	Merge remote-tracking branch 'origin/release-50' into 0403-sync-release-50-back-to-master	2023-04-04 16:42:58 +02:00
Thales Macedo Garitezi	0b6fd7fe14	fix(buffer_worker): check request timeout and health check interval Port of https://github.com/emqx/emqx/pull/10154 for `release-50` Fixes https://emqx.atlassian.net/browse/EMQX-9099 Originally, the `resume_interval`, which is what defines how often a buffer worker will attempt to retry its inflight window, was set to the same as the `health_check_interval`. This had the problem that, with default values, `health_check_interval = request_timeout`. This meant that, if a buffer worker with those configs were ever blocked, all requests would have timed out by the time it retried them. Here we change the default `resume_interval` to a reasonable value dependent on `health_check_interval` and `request_timeout`, and also expose that as a hidden parameter for fine tuning if necessary.	2023-04-04 08:58:36 -03:00
Thales Macedo Garitezi	f3ffc02bff	feat(bridges): enable async query mode for all bridges with buffer workers Fixes https://emqx.atlassian.net/browse/EMQX-9130 Since buffer workers always support async calls ("outer calls"), we should decouple those two call modes (inner and outer), and avoid exposing the inner call configuration to user to avoid complexity. For bridges that currently only allow sync query modes, we should allow them to be configured with async. That means basically all bridge types except Kafka Producer.	2023-04-03 14:49:51 -03:00
Zaiming (Stone) Shi	36000abf51	refactor: relocate i18n files for apps/emqx	2023-04-03 13:12:24 +02:00
zhongwencool	d63680cf25	Merge pull request #10307 from emqx/release-50 Sync release-50 back to master	2023-04-02 11:36:41 +08:00
Thales Macedo Garitezi	246a792965	Merge pull request #10273 from thalesmg/refactor-kprod-start-error-msg-rv50 fix: return friendly message when kafka producer and consumer fails to start (rv5.0)	2023-03-31 16:25:26 -03:00
Thales Macedo Garitezi	5011486b18	fix(kafka_consumer): return better error messages when probing kafka consumer bridge Fixes https://emqx.atlassian.net/browse/EMQX-9422	2023-03-31 11:33:15 -03:00
Zaiming (Stone) Shi	bcde52383b	docs: fix max batch size desc	2023-03-31 12:35:27 +02:00
Thales Macedo Garitezi	632bffd451	fix: return friendly message when kafka producer fails to start (rv5.0) Fixes https://emqx.atlassian.net/browse/EMQX-9392 The returned information does not allow to diagnose the issue (i.e.: a connection issue due to the wrong host and port, the wrong password failing authn). However, such information is printed to the logs. This changes the returned error to the API so that the user is hinted at looking at the logs for further investigation of the error.	2023-03-30 11:51:36 -03:00
Kjell Winblad	8e0d315b7b	Merge pull request #10197 from kjellwinblad/0321-fix-inflight-window-hand-over-to-kjell fix: add inflight window setting to the clickhouse bridge	2023-03-29 09:38:24 +02:00
Zaiming (Stone) Shi	d07987288a	chore: add some example annotations for config importance level	2023-03-28 14:29:24 +02:00
Zaiming (Stone) Shi	dd996ad1dc	chore: bump app vsns	2023-03-24 21:47:15 +01:00
Thales Macedo Garitezi	ff272a2071	Merge pull request #10206 from thalesmg/decouple-buffer-worker-query-call-mode-v50 feat(buffer_worker): decouple query mode from underlying connector call mode	2023-03-24 13:49:00 -03:00
Thales Macedo Garitezi	f8d5d53908	feat(buffer_worker): decouple query mode from underlying connector call mode Fixes https://emqx.atlassian.net/browse/EMQX-9129 Currently, if an user configures a bridge with query mode sync, then all calls to the underlying driver/connector ("inner calls") will always be synchronous, regardless of its support for async calls. Since buffer workers always support async queries ("outer calls"), we should decouple those two call modes (inner and outer), and avoid exposing the inner call configuration to user to avoid complexity. There are two situations when we want to force synchronous calls to the underlying connector even if it supports async: 1) When using `simple_sync_query`, since we are bypassing the buffer workers; 2) When retrying the inflight window, to avoid overwhelming the driver.	2023-03-23 13:40:31 -03:00
Kjell Winblad	35474578ca	refactor: rename async_inflight_window to inflight_window everywhere	2023-03-23 14:21:57 +01:00
Kjell Winblad	9d3f369cca	docs: fix spelling mistake Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>	2023-03-23 14:09:57 +01:00
Thales Macedo Garitezi	ddffba0355	Merge pull request #10154 from thalesmg/fix-buffer-worker-default-req-timeout fix(buffer_worker): calculate default `resume_interval` based on `request_timeout` and `health_check_interval`	2023-03-22 20:21:04 -03:00
Thales Macedo Garitezi	8844b22c80	docs: improve descriptions Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-03-22 15:32:09 -03:00
Thales Macedo Garitezi	127a075b66	test(dynamo): attempt to fix dynamo tests Those tests in the `flaky` test are really flaky and require lots of CI retries. Apparently, the flakiness comes from race conditions from restarting bridges with the same name too fast between test cases. Previously, all test cases were sharing the same bridge name (the module name).	2023-03-22 14:34:37 -03:00
Thales Macedo Garitezi	61cb03b45a	fix(buffer_worker): change the default `resume_interval` value and expose it as hidden config Also removes the previously added alarm for request timeout. There are situations where having a short request timeout and a long health check interval make sense, so we don't want to alarm the user for those situations. Instead, we automatically attempt to set a reasonable `resume_interval` value.	2023-03-22 11:47:36 -03:00
Kjell Winblad	27b8445337	fix: add inflight window setting to the clickhouse bridge This commit makes sure the inflight window setting is present for the clickhouse bridge. It also changes emqx_resource_schema that previously removed the inflight window setting from resources with query mode `always_sync`. We don't need to do that because all bridges that uses the buffer worker queue will get async call handling even if the bridge don't support the async callback. Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-03-21 17:14:03 +01:00
Stefan Strigler	c1384b6e6e	feat(emqx_resource): include error with alarm for resource_down	2023-03-21 15:02:29 +01:00
Stefan Strigler	53825b9aba	fix(emqx_bridge): propagate connection error to resource status	2023-03-21 15:02:29 +01:00
Thales Macedo Garitezi	20414d7373	fix(buffer_worker): check request timeout and health check interval Fixes https://emqx.atlassian.net/browse/EMQX-9099 The default value for `request_timeout` is 15 seconds, and the default resume interval is also 15 seconds (the health check timeout, if `resume_interval` is not explicitly given). This means that, in practice, if a buffer worker ever gets into the blocked state, then almost all requests will timeout. Proposed improvement: - `request_timeout` should by default be twice as much as health_check_interval. - Emit a alarm if `request_timeout` is not greater than `health_check_interval`.	2023-03-16 13:46:45 -03:00
Thales Macedo Garitezi	d464e2aad5	refactor: rename test resource prefix Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi	03342923b9	fix(bridge): use the same dry run prefix Kafka Producer and Consumer bridges rely on this prefix for detecting a dry run and avoid leaking atoms. At some point, this prefix was changed, effectively disabling the check in Kafka Producer.	2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi	91a57faa95	Merge pull request #10128 from thalesmg/ocsp-v50-mkII feat: add ocsp stapling support to mqtt ssl listener (5.0)	2023-03-16 13:10:48 -03:00
Thales Macedo Garitezi	164440fe83	test(resource): fix flaky test Sometimes this test might retry more times, so we check the prefix of the trace only.	2023-03-15 14:25:55 -03:00
Andrew Mayorov	a9bc8a4464	refactor(resman): rename `ets_lookup` → `lookup_cached` That way we hide the impementation details + the interface becomes cleaner and more obvious.	2023-03-15 19:17:30 +03:00
Andrew Mayorov	29907875bf	test(bufworker): set `batch_time` for batch-related testcases By default it's `0` since `e9d3fc51`. This made a couple of tests prone to flapping.	2023-03-15 19:17:30 +03:00
Andrew Mayorov	e411c5d5f8	refactor(resman): work with state cache atomically Also ensure that cache entries are always consistent with `Data`, so that most of the code could rely on reading the cached entry most of the time.	2023-03-15 19:17:30 +03:00
Andrew Mayorov	cad6492c99	perf(bridge-api): ask bridge listings in parallel Also rename response formatting functions to better clarify their purpose.	2023-03-15 19:17:29 +03:00
Thales Macedo Garitezi	422597a441	test: fix flaky tests	2023-03-14 16:08:47 -03:00
Andrew Mayorov	686bf8255b	fix(bridge): reply `emqx_resource:get_instance/1` from cache The resource manager may be busy at times, so this change ensures that getting resource instance state will not block. Currently, no users of `emqx_resource:get_instance/1` do seem to be relying on state being "as-actual-as-possible" guarantee it was providing.	2023-03-13 14:35:08 +03:00
Andrew Mayorov	a86d06f043	chore: bump app versions following last merge-back	2023-03-10 16:44:15 +03:00
Zaiming (Stone) Shi	fe27604010	Merge remote-tracking branch 'origin/release-50' into 0308-merge-release-50-back-to-master	2023-03-08 16:46:45 +01:00
Thales Macedo Garitezi	eef65fba60	fix(buffer_worker): handle `request_timeout = infinity` case The current schema allows `infinity` for `request_timeout`, so we have to take that into account. It's not currently possible to set `batch_time = infinity`, so there's no need to treat that case.	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	18ab7ed197	chore: bump app vsns	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	0e707e837f	docs(buffer_worker): improve description of `request_timeout`	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	e9d3fc511f	chore(buffer_worker): change default `batch_time` to 0 and improve docs	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	f95a30ae89	fix(webhook): convert `request_timeout`s in root and resource_opts	2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi	167b7a212f	refactor(buffer_worker): avoid starting 0-time timers	2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi	e9ffabf936	fix(buffer_worker): add batch time automatic adjustment To avoid message loss due to misconfigurations, we adjust `batch_time` based on `request_timeout`. If `batch_time` > `request_timeout`, all requests will timeout before being sent if the message rate is low. Even worse if `pool_size` is high. We cap `batch_time` at `request_timeout div 2` as a rule of thumb.	2023-03-06 10:12:38 -03:00
Kjell Winblad	67acdf0888	feat: add clickhouse database bridge This commit adds a Clickhouse bridge to EMQX 5. The bridge is similar to the Clickhouse bridge in the 4.4, but adds the possibility to use different formats (such as JSON) for values to be inserted.	2023-03-02 12:22:11 +01:00
Andrew Mayorov	c883e4b36a	test: drop custom `loop_wait` in favor of snabkaffe's `?retry`	2023-02-24 18:16:35 +03:00
Andrew Mayorov	2b4e49e7df	fix(bufworker): handle replies of simple async queries Before that change, simple queries were treated as "retries" essentially, thus skipping all the reply processing there is.	2023-02-24 15:06:49 +03:00
Zaiming (Stone) Shi	c97d17cc91	test: refactor to loop wait for counters	2023-02-24 09:02:03 +01:00
Zaiming (Stone) Shi	a10dbba084	refactor(buffer_worker): less defensive on inflight counter decrement	2023-02-23 21:23:10 +01:00
Zaiming (Stone) Shi	7a6465e2cf	fix(buffer_worker): ensure flush timer reset in blocked state	2023-02-23 21:06:38 +01:00
Zaiming (Stone) Shi	3a6dbbdd05	refactor(buffer_worker): ensure flsh message is never missed	2023-02-23 20:11:00 +01:00
Zaiming (Stone) Shi	dbfdeec5e9	fix(buffer_worker): log unknown async replies	2023-02-23 12:55:49 +01:00
Zaiming (Stone) Shi	356a94af30	fix(buffer_worker): ensure async flush message is sent This is a new issue introduced in the previous fix commits after handling the partial expiry correctly, the IsFullBefore check is no longer the state before the reply is received but the state after a partially-expired batch is shrinked. The fix is simple, move the check to the entry-point of where async reply callback enters, then send an async 'flush' notification regardless of the handling result.	2023-02-23 09:47:34 +01:00
Zaiming (Stone) Shi	713220f88b	refactor(buffer_worker): more generic process for all_expired	2023-02-23 00:04:20 +01:00
Zaiming (Stone) Shi	036f69cd6e	test: ensure batch size > 1 is covered in expiration test	2023-02-22 23:26:04 +01:00
Zaiming (Stone) Shi	bf8becd521	test: make sure gauge return to 0 in test cases	2023-02-22 23:07:12 +01:00
Zaiming (Stone) Shi	fc614e16e5	fix(bridge): update inflight items after partial expiry	2023-02-22 22:05:56 +01:00
Zaiming (Stone) Shi	bb13d0708f	fix(bridge): fix dropped counter and inflight gauge Prior to this fix there were two metrics issues 1. if a batch is all requests expired when receiving a reply it only bumped 1 instead of the batch size for 'late_reply' 2. when a batch is partially delivered (or expired), the dropped requests were not decremented from the inflight size gauge	2023-02-22 13:20:58 +01:00
Erik Timan	056bc71af2	chore: bump VSN version	2023-02-16 15:05:38 +01:00
Erik Timan	2442a4dea7	test(emqx_resource): add regression test for recursive flushing	2023-02-16 14:17:16 +01:00
Erik Timan	dcf70e0e68	refactor(emqx_resource): add more trace points for flushing	2023-02-16 14:17:16 +01:00
Zaiming (Stone) Shi	fb61c2b266	perf: avoid getting metrics (gen_server:call) for each resource lookup	2023-02-10 19:40:37 +01:00
Zaiming (Stone) Shi	42dfaf3ef2	Merge pull request #9910 from sstrigler/EMQX-8861-improve-bridge-restart-button-behaviour EMQX 8861 improve bridge restart button behaviour	2023-02-09 18:00:48 +01:00
Andrew Mayorov	81b1bab11e	chore: bump `emqx_resource` version to 0.1.7 Also add the changelog entry.	2023-02-08 14:21:30 +03:00
Andrew Mayorov	c6fc0ec8cd	fix(bufworker): do not avoid retry if inflight table is full Otherwise there's no other piece of code that would retry the inflight queries in that case.	2023-02-08 14:08:04 +03:00
Andrew Mayorov	d8d06a260f	test(buffer): add test on inflight overflow w/ async queries This testcase should verify that the buffer will retry all inflight queries failed with recoverable errors + flush all outstanding queries. Co-authored-by: ieQu1 <99872536+ieQu1@users.noreply.github.com>	2023-02-08 14:08:04 +03:00
Stefan Strigler	86f3f5787f	feat: allow to manually re-connect disconected bridge	2023-02-07 11:58:30 +01:00
Zaiming (Stone) Shi	7ea140599a	Merge pull request #9894 from id/ci-always-run-static-checks ci: always run static_checks	2023-02-02 16:33:19 +01:00
Zaiming (Stone) Shi	feca4cc0a5	Merge pull request #9892 from zmstone/0202-docs-cosmetic 0202 docs cosmetic	2023-02-02 15:43:58 +01:00
Zaiming (Stone) Shi	58627b7958	chore(emqx_resource_manager): ignore unused return value for dialyzer	2023-02-02 14:11:12 +01:00
Zaiming (Stone) Shi	c0d478bd41	fix(buffer_worker): type spec	2023-02-02 14:11:12 +01:00
zhongwencool	ee852d8204	Merge pull request #9886 from zhongwencool/mongo-connection-default-async fix: remove async mode from mongodb/redis/mysql/pgsql bridge	2023-02-02 21:08:01 +08:00
Zaiming (Stone) Shi	d5c482b0b0	docs: remove timer unit from description the user input has time unit. e.g. "5s" for 5 seconds etc.	2023-02-02 13:49:20 +01:00
Zaiming (Stone) Shi	9864587389	fix: send to buffer-supported connector even when disconnected	2023-02-02 12:04:17 +01:00
Zaiming (Stone) Shi	13ef30c46c	Merge pull request #9884 from savonarola/resource-fixes fix(resources): fix resource lifecycle	2023-02-02 12:02:34 +01:00
Zhongwen Deng	1c9035d24c	test: remove async from redis ct	2023-02-02 17:37:18 +08:00
Zhongwen Deng	22cc1cc745	fix: make spell_check happy	2023-02-02 17:37:18 +08:00
Zhongwen Deng	f8936013b7	chore: replace async with sync	2023-02-02 17:37:18 +08:00
Zhongwen Deng	22c3f50020	fix: add query_mode_sync_only for mysql pgsql redis mongodb bridge	2023-02-02 17:37:18 +08:00
Andrew Mayorov	ca5c192f4b	Merge pull request #9882 from fwup/fix/no-mqtt-bridge-middleman refactor(mqtt-worker): avoid unnecessary abstraction	2023-02-02 13:11:31 +04:00
Zhongwen Deng	1a90c1654c	chore: bad typo	2023-02-02 11:43:04 +08:00
Ilya Averyanov	14f528cc86	fix(resources): fix resource lifecycle * do not resume all buffer workers on successful healthcheck * do not pass undefined state to resource healthcheck callback	2023-02-01 18:26:13 +02:00
Andrew Mayorov	5fd7f65a1f	test(bufworker): make testcase simpler to follow The confusion was due to the fact that subsequent query was missing `async_reply_fun` and thus, was not accumulating in the results.	2023-02-01 16:52:47 +03:00
Andrew Mayorov	ff473e0f1b	test(bufworker): fix testcase flapping due to data races	2023-02-01 12:57:46 +03:00
Zaiming (Stone) Shi	b3ad9e97d2	Merge pull request #9870 from keynslug/fix/mqtt-connection-loss-feedback feat(mqtt-bridge): avoid middleman process	2023-01-31 19:12:18 +01:00
Andrew Mayorov	c76311c9c3	fix(buffer): count inflight batches properly	2023-01-31 18:30:42 +03:00

1 2 3 4 5 ...

554 Commits