yuanbiao/emqx - emqx

Commit Graph

Author	SHA1	Message	Date
Thales Macedo Garitezi	ca435975de	fix(webhook): treat http status code 429 as recoverable	2023-06-30 09:46:03 -03:00
Thales Macedo Garitezi	59b109eb5c	fix(webhook): treat 404 and other error replies as errors in async requests Fixes https://emqx.atlassian.net/browse/EMQX-10405 The problem here was that, for async requests, ehttpc responses of the form `{ok, 4__, _, _}` and similar were being treated as successes.	2023-06-29 15:45:23 -03:00
zhongwencool	093cdab838	chore: to_integer to make sure integer is converted	2023-06-20 08:39:23 +08:00
zhongwencool	07172e42f0	test: integer CI check failed	2023-06-20 08:39:23 +08:00
Stefan Strigler	0d6d441f4c	test(emqx_connector): start/stop test for webhook bridge	2023-06-14 09:56:50 +02:00
Stefan Strigler	b2a5065641	fix(emqx_connector): report errors in on_start handler	2023-06-13 16:57:08 +02:00
Thales Macedo Garitezi	99796224d8	refactor(resource): rename `request_timeout` -> `request_ttl` See https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options	2023-06-01 13:01:53 -03:00
Thales Macedo Garitezi	10425eb925	feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval` See: https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options Current problem: In 5.0.x, we have two timer options that control the state changing of buffer worker resources: auto_restart_interval and health_check_interval. - auto_restart_interval controls how often the resource attempts to transition from disconnected to connected. - health_check_interval controls how often the resource is checked and potentially moved from connected to disconnected or connecting. The existence of two independent timers for very similar purposes is confusing to users, QA and even developers. Also, an intimately related configuration is request_timeout, which can interact badly with auto_restart_interval if the latter is poorly configured: requests may always expire if request_timeout < auto_restart_interval and if the resource enters the disconnected state. For health_check_interval, we attempt to derive a sane default that gives requests a chance to retry (if request timeout is finite, then the resource retries requests with a period of min(health_check_interval, request_timeout / 3). Another problem with the separate auto_restart_interval is that its default value (60 s) is too high when compared to the default request timeout and health check, leading to the problems described above if not tuned. Proposed solution: We propose to drop auto_restart_interval in favor of health_check_interval, which will be used for both disconnected -> connected and connected -> {disconnected, connecting} transition checks. With that, the resource will attempt to reconnect at the same interval as the health check, which currently is 15 s. Also, as two smaller changes to accompany this one: - Increase the default request_timeout from 15 s to 45 s. - Rename request_timeout to request_ttl.	2023-06-01 11:20:06 -03:00
Thales Macedo Garitezi	a7b41e1cdf	perf(webhook): add retry attempts for async This is a performance improvement for webhook bridge. Since this bridge is called using `async` callback mode, and `ehttpc` frequently returns errors of the form `normal` and `{shutdown, normal}` that are retried "for free" by `ehttpc`, we add this behavior to async requests as well. Other errors are retried too, but they are not "free": 3 attempts are made at a maximum. This is important because, when using buffer workers, we should avoid making them enter the `blocked` state, since that halts all progress and makes throughput plummet.	2023-05-17 09:20:50 -03:00
Thales Macedo Garitezi	e073bc90bc	refactor(buffer_worker): rename `s/queue/buffer/g`	2023-04-14 11:37:19 -03:00
Kjell Winblad	35474578ca	refactor: rename async_inflight_window to inflight_window everywhere	2023-03-23 14:21:57 +01:00
Thales Macedo Garitezi	66eb4ef069	test: fix inter-suite flakiness	2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi	03b95073fc	test: fix inter-suite flakiness	2023-03-15 14:25:41 -03:00
Zaiming (Stone) Shi	26b29185b2	test(emqx_bridge_webhook_SUITE): fix flakyness in test web server	2023-03-07 20:57:38 +01:00
Kjell Winblad	163b33ab28	test: remove unnecessary dependencies of ee apps	2023-03-07 20:57:38 +01:00
Kjell Winblad	ca947e3e70	fix: lost messages when HTTP connection times out When using async mode with the webhook bridge, queued messages that are not fully processed when the connection times out could be lost. This commit fixes this by letting the bridge return a recoverable_error when this happen. The message send will then be retried in sync mode by the emqx_resource_buffer_worker. Fixes: https://emqx.atlassian.net/browse/EMQX-8974	2023-03-07 20:57:19 +01:00

16 Commits