yuanbiao/emqx - emqx

Commit Graph

Author	SHA1	Message	Date
Thales Macedo Garitezi	61cb03b45a	fix(buffer_worker): change the default `resume_interval` value and expose it as hidden config Also removes the previously added alarm for request timeout. There are situations where having a short request timeout and a long health check interval make sense, so we don't want to alarm the user for those situations. Instead, we automatically attempt to set a reasonable `resume_interval` value.	2023-03-22 11:47:36 -03:00
Kjell Winblad	27b8445337	fix: add inflight window setting to the clickhouse bridge This commit makes sure the inflight window setting is present for the clickhouse bridge. It also changes emqx_resource_schema that previously removed the inflight window setting from resources with query mode `always_sync`. We don't need to do that because all bridges that uses the buffer worker queue will get async call handling even if the bridge don't support the async callback. Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-03-21 17:14:03 +01:00
Stefan Strigler	c1384b6e6e	feat(emqx_resource): include error with alarm for resource_down	2023-03-21 15:02:29 +01:00
Stefan Strigler	53825b9aba	fix(emqx_bridge): propagate connection error to resource status	2023-03-21 15:02:29 +01:00
Thales Macedo Garitezi	20414d7373	fix(buffer_worker): check request timeout and health check interval Fixes https://emqx.atlassian.net/browse/EMQX-9099 The default value for `request_timeout` is 15 seconds, and the default resume interval is also 15 seconds (the health check timeout, if `resume_interval` is not explicitly given). This means that, in practice, if a buffer worker ever gets into the blocked state, then almost all requests will timeout. Proposed improvement: - `request_timeout` should by default be twice as much as health_check_interval. - Emit a alarm if `request_timeout` is not greater than `health_check_interval`.	2023-03-16 13:46:45 -03:00
Thales Macedo Garitezi	d464e2aad5	refactor: rename test resource prefix Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi	03342923b9	fix(bridge): use the same dry run prefix Kafka Producer and Consumer bridges rely on this prefix for detecting a dry run and avoid leaking atoms. At some point, this prefix was changed, effectively disabling the check in Kafka Producer.	2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi	91a57faa95	Merge pull request #10128 from thalesmg/ocsp-v50-mkII feat: add ocsp stapling support to mqtt ssl listener (5.0)	2023-03-16 13:10:48 -03:00
Thales Macedo Garitezi	164440fe83	test(resource): fix flaky test Sometimes this test might retry more times, so we check the prefix of the trace only.	2023-03-15 14:25:55 -03:00
Andrew Mayorov	a9bc8a4464	refactor(resman): rename `ets_lookup` → `lookup_cached` That way we hide the impementation details + the interface becomes cleaner and more obvious.	2023-03-15 19:17:30 +03:00
Andrew Mayorov	29907875bf	test(bufworker): set `batch_time` for batch-related testcases By default it's `0` since `e9d3fc51`. This made a couple of tests prone to flapping.	2023-03-15 19:17:30 +03:00
Andrew Mayorov	e411c5d5f8	refactor(resman): work with state cache atomically Also ensure that cache entries are always consistent with `Data`, so that most of the code could rely on reading the cached entry most of the time.	2023-03-15 19:17:30 +03:00
Andrew Mayorov	cad6492c99	perf(bridge-api): ask bridge listings in parallel Also rename response formatting functions to better clarify their purpose.	2023-03-15 19:17:29 +03:00
Thales Macedo Garitezi	422597a441	test: fix flaky tests	2023-03-14 16:08:47 -03:00
Andrew Mayorov	686bf8255b	fix(bridge): reply `emqx_resource:get_instance/1` from cache The resource manager may be busy at times, so this change ensures that getting resource instance state will not block. Currently, no users of `emqx_resource:get_instance/1` do seem to be relying on state being "as-actual-as-possible" guarantee it was providing.	2023-03-13 14:35:08 +03:00
Andrew Mayorov	a86d06f043	chore: bump app versions following last merge-back	2023-03-10 16:44:15 +03:00
Zaiming (Stone) Shi	fe27604010	Merge remote-tracking branch 'origin/release-50' into 0308-merge-release-50-back-to-master	2023-03-08 16:46:45 +01:00
Thales Macedo Garitezi	eef65fba60	fix(buffer_worker): handle `request_timeout = infinity` case The current schema allows `infinity` for `request_timeout`, so we have to take that into account. It's not currently possible to set `batch_time = infinity`, so there's no need to treat that case.	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	18ab7ed197	chore: bump app vsns	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	0e707e837f	docs(buffer_worker): improve description of `request_timeout`	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	e9d3fc511f	chore(buffer_worker): change default `batch_time` to 0 and improve docs	2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi	f95a30ae89	fix(webhook): convert `request_timeout`s in root and resource_opts	2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi	167b7a212f	refactor(buffer_worker): avoid starting 0-time timers	2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi	e9ffabf936	fix(buffer_worker): add batch time automatic adjustment To avoid message loss due to misconfigurations, we adjust `batch_time` based on `request_timeout`. If `batch_time` > `request_timeout`, all requests will timeout before being sent if the message rate is low. Even worse if `pool_size` is high. We cap `batch_time` at `request_timeout div 2` as a rule of thumb.	2023-03-06 10:12:38 -03:00
Kjell Winblad	67acdf0888	feat: add clickhouse database bridge This commit adds a Clickhouse bridge to EMQX 5. The bridge is similar to the Clickhouse bridge in the 4.4, but adds the possibility to use different formats (such as JSON) for values to be inserted.	2023-03-02 12:22:11 +01:00
Andrew Mayorov	c883e4b36a	test: drop custom `loop_wait` in favor of snabkaffe's `?retry`	2023-02-24 18:16:35 +03:00
Andrew Mayorov	2b4e49e7df	fix(bufworker): handle replies of simple async queries Before that change, simple queries were treated as "retries" essentially, thus skipping all the reply processing there is.	2023-02-24 15:06:49 +03:00
Zaiming (Stone) Shi	c97d17cc91	test: refactor to loop wait for counters	2023-02-24 09:02:03 +01:00
Zaiming (Stone) Shi	a10dbba084	refactor(buffer_worker): less defensive on inflight counter decrement	2023-02-23 21:23:10 +01:00
Zaiming (Stone) Shi	7a6465e2cf	fix(buffer_worker): ensure flush timer reset in blocked state	2023-02-23 21:06:38 +01:00
Zaiming (Stone) Shi	3a6dbbdd05	refactor(buffer_worker): ensure flsh message is never missed	2023-02-23 20:11:00 +01:00
Zaiming (Stone) Shi	dbfdeec5e9	fix(buffer_worker): log unknown async replies	2023-02-23 12:55:49 +01:00
Zaiming (Stone) Shi	356a94af30	fix(buffer_worker): ensure async flush message is sent This is a new issue introduced in the previous fix commits after handling the partial expiry correctly, the IsFullBefore check is no longer the state before the reply is received but the state after a partially-expired batch is shrinked. The fix is simple, move the check to the entry-point of where async reply callback enters, then send an async 'flush' notification regardless of the handling result.	2023-02-23 09:47:34 +01:00
Zaiming (Stone) Shi	713220f88b	refactor(buffer_worker): more generic process for all_expired	2023-02-23 00:04:20 +01:00
Zaiming (Stone) Shi	036f69cd6e	test: ensure batch size > 1 is covered in expiration test	2023-02-22 23:26:04 +01:00
Zaiming (Stone) Shi	bf8becd521	test: make sure gauge return to 0 in test cases	2023-02-22 23:07:12 +01:00
Zaiming (Stone) Shi	fc614e16e5	fix(bridge): update inflight items after partial expiry	2023-02-22 22:05:56 +01:00
Zaiming (Stone) Shi	bb13d0708f	fix(bridge): fix dropped counter and inflight gauge Prior to this fix there were two metrics issues 1. if a batch is all requests expired when receiving a reply it only bumped 1 instead of the batch size for 'late_reply' 2. when a batch is partially delivered (or expired), the dropped requests were not decremented from the inflight size gauge	2023-02-22 13:20:58 +01:00
Erik Timan	056bc71af2	chore: bump VSN version	2023-02-16 15:05:38 +01:00
Erik Timan	2442a4dea7	test(emqx_resource): add regression test for recursive flushing	2023-02-16 14:17:16 +01:00
Erik Timan	dcf70e0e68	refactor(emqx_resource): add more trace points for flushing	2023-02-16 14:17:16 +01:00
Zaiming (Stone) Shi	fb61c2b266	perf: avoid getting metrics (gen_server:call) for each resource lookup	2023-02-10 19:40:37 +01:00
Zaiming (Stone) Shi	42dfaf3ef2	Merge pull request #9910 from sstrigler/EMQX-8861-improve-bridge-restart-button-behaviour EMQX 8861 improve bridge restart button behaviour	2023-02-09 18:00:48 +01:00
Andrew Mayorov	81b1bab11e	chore: bump `emqx_resource` version to 0.1.7 Also add the changelog entry.	2023-02-08 14:21:30 +03:00
Andrew Mayorov	c6fc0ec8cd	fix(bufworker): do not avoid retry if inflight table is full Otherwise there's no other piece of code that would retry the inflight queries in that case.	2023-02-08 14:08:04 +03:00
Andrew Mayorov	d8d06a260f	test(buffer): add test on inflight overflow w/ async queries This testcase should verify that the buffer will retry all inflight queries failed with recoverable errors + flush all outstanding queries. Co-authored-by: ieQu1 <99872536+ieQu1@users.noreply.github.com>	2023-02-08 14:08:04 +03:00
Stefan Strigler	86f3f5787f	feat: allow to manually re-connect disconected bridge	2023-02-07 11:58:30 +01:00
Zaiming (Stone) Shi	7ea140599a	Merge pull request #9894 from id/ci-always-run-static-checks ci: always run static_checks	2023-02-02 16:33:19 +01:00
Zaiming (Stone) Shi	feca4cc0a5	Merge pull request #9892 from zmstone/0202-docs-cosmetic 0202 docs cosmetic	2023-02-02 15:43:58 +01:00
Zaiming (Stone) Shi	58627b7958	chore(emqx_resource_manager): ignore unused return value for dialyzer	2023-02-02 14:11:12 +01:00
Zaiming (Stone) Shi	c0d478bd41	fix(buffer_worker): type spec	2023-02-02 14:11:12 +01:00
zhongwencool	ee852d8204	Merge pull request #9886 from zhongwencool/mongo-connection-default-async fix: remove async mode from mongodb/redis/mysql/pgsql bridge	2023-02-02 21:08:01 +08:00
Zaiming (Stone) Shi	d5c482b0b0	docs: remove timer unit from description the user input has time unit. e.g. "5s" for 5 seconds etc.	2023-02-02 13:49:20 +01:00
Zaiming (Stone) Shi	9864587389	fix: send to buffer-supported connector even when disconnected	2023-02-02 12:04:17 +01:00
Zaiming (Stone) Shi	13ef30c46c	Merge pull request #9884 from savonarola/resource-fixes fix(resources): fix resource lifecycle	2023-02-02 12:02:34 +01:00
Zhongwen Deng	1c9035d24c	test: remove async from redis ct	2023-02-02 17:37:18 +08:00
Zhongwen Deng	22cc1cc745	fix: make spell_check happy	2023-02-02 17:37:18 +08:00
Zhongwen Deng	f8936013b7	chore: replace async with sync	2023-02-02 17:37:18 +08:00
Zhongwen Deng	22c3f50020	fix: add query_mode_sync_only for mysql pgsql redis mongodb bridge	2023-02-02 17:37:18 +08:00
Andrew Mayorov	ca5c192f4b	Merge pull request #9882 from fwup/fix/no-mqtt-bridge-middleman refactor(mqtt-worker): avoid unnecessary abstraction	2023-02-02 13:11:31 +04:00
Zhongwen Deng	1a90c1654c	chore: bad typo	2023-02-02 11:43:04 +08:00
Ilya Averyanov	14f528cc86	fix(resources): fix resource lifecycle * do not resume all buffer workers on successful healthcheck * do not pass undefined state to resource healthcheck callback	2023-02-01 18:26:13 +02:00
Andrew Mayorov	5fd7f65a1f	test(bufworker): make testcase simpler to follow The confusion was due to the fact that subsequent query was missing `async_reply_fun` and thus, was not accumulating in the results.	2023-02-01 16:52:47 +03:00
Andrew Mayorov	ff473e0f1b	test(bufworker): fix testcase flapping due to data races	2023-02-01 12:57:46 +03:00
Zaiming (Stone) Shi	b3ad9e97d2	Merge pull request #9870 from keynslug/fix/mqtt-connection-loss-feedback feat(mqtt-bridge): avoid middleman process	2023-01-31 19:12:18 +01:00
Andrew Mayorov	c76311c9c3	fix(buffer): count inflight batches properly	2023-01-31 18:30:42 +03:00
Zaiming (Stone) Shi	b3e486041b	Merge pull request #9853 from zmstone/0127-refactor-buffer-worker-no-need-to-keep-request-for-reply-callback 0127 refactor buffer worker no need to keep request for reply callback	2023-01-31 08:44:01 +01:00
Stefan Strigler	27881064dc	fix: increase dropped.queue_full by number of messages	2023-01-30 11:37:35 +01:00
Zaiming (Stone) Shi	d47941601d	refactor(buffer_worker): rename trace points	2023-01-28 11:52:11 +01:00
Zaiming (Stone) Shi	7f66c6a9e2	Merge pull request #9840 from olcai/redact-influxdb-tokens fix: redact influxdb tokens in logs and reduce log level	2023-01-28 11:47:36 +01:00
Zaiming (Stone) Shi	fc38ea9571	refactor(buffer_worker): do not keep request body in reply context the request body can be potentially very large the reply context is sent to the async call handler and kept in its memory until the async reply is received from bridge target service. this commit tries to minimize the size of the reply context by replacing the request body with `[]`.	2023-01-27 17:12:55 +01:00
Zaiming (Stone) Shi	578271ea3d	refactor: use lists:map instead of lc for safty	2023-01-27 15:15:46 +01:00
Zaiming (Stone) Shi	f793807bc1	refactor(buffer_worker): rename function batch_reply_after_query to handle_async_batch_reply	2023-01-27 15:04:28 +01:00
Zaiming (Stone) Shi	262c3a2869	refactor(buffer_worker): rename function from reply_after_query to handle_async_reply	2023-01-27 15:03:18 +01:00
Zaiming (Stone) Shi	52b75ada04	Merge pull request #9832 from sstrigler/EMQX-8774-failure-to-handle-timeout-error-in-resource-worker EMQX 8774 failure to handle timeout error in resource worker	2023-01-27 14:36:44 +01:00
Zaiming (Stone) Shi	514609bcf7	Merge pull request #9850 from zmstone/0127-fix-influxdb-bridge-atom-leak 0127 fix influxdb bridge atom leak	2023-01-27 14:30:20 +01:00
Zaiming (Stone) Shi	d53106145f	fix: stop resource when resource manager terminates	2023-01-27 12:39:05 +01:00
Stefan Strigler	2d62de5188	test: fix expected result from timeout error	2023-01-27 11:43:48 +01:00
Stefan Strigler	a180bd9aa5	fix: catch error, not exit	2023-01-27 11:40:06 +01:00
Stefan Strigler	b7e3f9d5a6	fix: try-case-of rather than try-of try-of catches only what happens within but not after	2023-01-27 11:40:06 +01:00
Zaiming (Stone) Shi	db2f631a8a	refactor(buffer_worker): simplify caller reply	2023-01-27 11:33:45 +01:00
Zaiming (Stone) Shi	d4fab92b72	refactor(buffer_worker): no need to keep request for REPLY macro	2023-01-27 10:41:30 +01:00
Zaiming (Stone) Shi	1f799dfd59	fix: reply with {error, buffer_overflow} when discarded	2023-01-26 17:15:36 +01:00
Zaiming (Stone) Shi	ed28789164	refactor(buffer_worker): no need to return after collect into buf queue	2023-01-26 14:50:40 +01:00
Zaiming (Stone) Shi	25b4821adc	refactor: move the the per-message overflow log from error to info level	2023-01-26 14:48:43 +01:00
Zaiming (Stone) Shi	bb26632c8a	fix(buffer_worker): fix a wrong assertion the assertion is to ensure queue items are not binary but should not assert the queue itself	2023-01-26 14:33:16 +01:00
Erik Timan	805d08e823	fix: reduce log level from error to warning in several places This reduces the log level from error to warning in places that are connected to the influxdb bridge. Transient errors for external resources should not render an error log.	2023-01-25 14:49:50 +01:00
Zaiming (Stone) Shi	5fdf7fd24c	fix(kafka): use async callback to bump success counters some telemetry events from wolff are discarded: * dropped: this is double counted in wolff, we now only subscribe to the dropped_queue_full event * retried_failed: it has different meanings in wolff, in wolff, it means it's the 2nd (or onward) produce attempt in EMQX, it means it's eventually failed after some retries * retried_success since we are going to handle the success counters in callbac this having this reported from wolff will only make things harder to understand * failed wolff never fails (unelss drop which is a different counter)	2023-01-24 21:12:36 +01:00
Erik Timan	9d20431257	fix(emqx_resource): fix crash while flushing queue We used next_event for flushing the queue in emqx_resource, but this leads to a crash. We now call flush_worker/1 instead.	2023-01-24 14:13:35 +01:00
Erik Timan	28718edbfd	chore: bump application VSNs	2023-01-24 14:12:34 +01:00
Zaiming (Stone) Shi	8fde169abb	Merge pull request #9821 from thalesmg/buffer-worker-expiry-v50 feat(buffer_worker): add expiration time to requests	2023-01-24 13:54:04 +01:00
Thales Macedo Garitezi	ca4a262b75	refactor: re-organize dealing with unrecoverable errors	2023-01-20 12:00:17 -03:00
Thales Macedo Garitezi	6fa6c679bb	feat(buffer_worker): add expiration time to requests With this, we avoid performing work or replying to callers that are no longer waiting on a result. Also introduces two new counters: - `dropped.expired` :: happens when a request expires before being sent downstream - `late_reply` :: when a response is receive from downstream, but the caller is no longer for a reply because the request has expired, and the caller might even have retried it.	2023-01-20 11:36:52 -03:00
Zaiming (Stone) Shi	1c3e055b13	Merge pull request #9822 from JimMoen/fix-schema-typo chore: i18n typo fix	2023-01-20 11:11:18 +01:00
JimMoen	16f45a60fd	chore: i18n typo fix	2023-01-20 11:50:01 +08:00
Thales Macedo Garitezi	47f796dd12	refactor: rename `emqx_resource_worker` -> `emqx_resource_buffer_worker` To make it more clear that it's purpose is serve as a buffering layer.	2023-01-18 16:15:34 -03:00
Ilya Averyanov	44a6e5ed15	chore(resources): add missing parameters to emqx_resource schema	2023-01-18 14:33:45 +02:00
Zaiming (Stone) Shi	d4f3b4c8c2	Merge remote-tracking branch 'origin/master' into fix-buffer-clear-replayq-on-delete-v50	2023-01-18 11:39:47 +01:00
Ivan Dyachkov	430b0a03d4	Merge pull request #9780 from id/fix-ensure-no-colon-in-filenames fix: ensure no colon in filenames	2023-01-18 09:36:16 +01:00
Zaiming (Stone) Shi	faf5916ed6	test: relax recoverable/unrecoverable error check for now, treat all other errors unrecoverable	2023-01-18 07:52:28 +01:00

1 2 3 4 5 ...

420 Commits