yuanbiao/emqx - emqx

Commit Graph

Author	SHA1	Message	Date
Zaiming (Stone) Shi	e8ccdb8d0f	Merge pull request #10998 from zmstone/0609-no-batch-for-mongodb fix(mongodb): hide batch_size for mongodb resource	2023-06-11 21:26:12 +02:00
Zaiming (Stone) Shi	ddef751527	fix(mongodb): hide batch_size for mongodb resource MongoDB connector currently does not support batching so the batch_size option has no effect. However we cannot remove the field, so we choose to hide it from schema	2023-06-11 11:08:58 +02:00
Kjell Winblad	4215da12f0	Merge pull request #10970 from kjellwinblad/kjell/feat/kafka_add_async_param/EMQX-8631 feat: add sync/async option to the Kafka producer bridge	2023-06-09 12:51:23 +02:00
Kjell Winblad	2671e8ecf9	fix: dialyzer type problem	2023-06-09 11:00:05 +02:00
firest	86a7b2d69a	fix(resource): improve log security when resource creation fails	2023-06-09 11:43:42 +08:00
Kjell Winblad	cb3a5fdbd4	style: only callback modules should do dynamic calls	2023-06-08 16:27:04 +02:00
Kjell Winblad	ed9e29e769	refactor: refacor query_mode detection code This commit refactor the query_mode resource detection code according to a suggestion from @zmstone. This commit should not contain any functional change except for a change of the Kafka producer bridge config.	2023-06-08 16:26:55 +02:00
Kjell Winblad	47fa17b3c1	feat: add sync/async option to the Kafka producer bridge This commit makes it possible to configure if a Kafka bridge should work in query mode sync or async by setting the resource_opts.query_mode configuration option. Fixes: https://emqx.atlassian.net/browse/EMQX-8631	2023-06-08 13:16:06 +02:00
Thales Macedo Garitezi	cc8631223e	fix(schema): avoid `function_clause` error when compacting errors Fixes https://emqx.atlassian.net/browse/EMQX-10168 The bridge probe API displayed the typecheck errors for the new timeout duration types correctly, but when an user tried to create the bridge anyway a `function_clause` error was raised when trying to compact hocon errors: ``` 09:47:19.045 [warning] [exception: :error, path: '/bridges', reason: {:case_clause, {:error, {:config_update_crashed, :function_clause}}}, stacktrace: [{:emqx_bridge_api, :create_or_update_bridge, 4, [file: '/home/thales/dev/emqx/emqx/apps/emqx_bridge/src/emqx_bridge_api.erl', line: 602]}, {:minirest_handler, :apply_callback, 3, [file: '/home/thales/dev/emqx/emqx/deps/minirest/src/minirest_handler.erl', line: 111]}, {:minirest_handler, :handle, 2, [file: '/home/thales/dev/emqx/emqx/deps/minirest/src/minirest_handler.erl', line: 44]}, {:minirest_handler, :init, 2, [file: '/home/thales/dev/emqx/emqx/deps/minirest/src/minirest_handler.erl', line: 27]}, {:cowboy_handler, :execute, 2, [file: '/home/thales/dev/emqx/emqx/deps/cowboy/src/cowboy_handler.erl', line: 41]}, {:cowboy_stream_h, :execute, 3, [file: '/home/thales/dev/emqx/emqx/deps/cowboy/src/cowboy_stream_h.erl', line: 318]}, {:cowboy_stream_h, :request_process, 3, [file: '/home/thales/dev/emqx/emqx/deps/cowboy/src/cowboy_stream_h.erl', line: 302]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]] ``` This fixes the issue so that both APIs return more friendly error messages.	2023-06-07 10:34:58 -03:00
Thales Macedo Garitezi	75df622426	Merge pull request #10930 from thalesmg/max-int-timeout-v50 check maximum timeout values in schema (5.1)	2023-06-05 12:51:36 -03:00
Thales Macedo Garitezi	46393343e2	chore: use `timeout_duration` types for timer fields Fixes https://emqx.atlassian.net/browse/EMQX-10020	2023-06-05 11:46:38 -03:00
lafirest	d51c658a30	Merge pull request #10908 from lafirest/feat/rocketmq_on_stop feat(rocketmq): refactored bridge to avoid leaking resources during crashes at creation	2023-06-05 15:00:32 +08:00
firest	5921ed3d2e	fix(rocketmq): improve function name	2023-06-05 10:43:28 +08:00
Thales Macedo Garitezi	0790c88aaf	refactor: use default's type as first union member Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-06-02 09:08:11 -03:00
Thales Macedo Garitezi	99796224d8	refactor(resource): rename `request_timeout` -> `request_ttl` See https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options	2023-06-01 13:01:53 -03:00
Thales Macedo Garitezi	f42ccb6262	feat(resource): increase default request timeout to 45 s See https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options	2023-06-01 11:20:06 -03:00
Thales Macedo Garitezi	10425eb925	feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval` See: https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options Current problem: In 5.0.x, we have two timer options that control the state changing of buffer worker resources: auto_restart_interval and health_check_interval. - auto_restart_interval controls how often the resource attempts to transition from disconnected to connected. - health_check_interval controls how often the resource is checked and potentially moved from connected to disconnected or connecting. The existence of two independent timers for very similar purposes is confusing to users, QA and even developers. Also, an intimately related configuration is request_timeout, which can interact badly with auto_restart_interval if the latter is poorly configured: requests may always expire if request_timeout < auto_restart_interval and if the resource enters the disconnected state. For health_check_interval, we attempt to derive a sane default that gives requests a chance to retry (if request timeout is finite, then the resource retries requests with a period of min(health_check_interval, request_timeout / 3). Another problem with the separate auto_restart_interval is that its default value (60 s) is too high when compared to the default request timeout and health check, leading to the problems described above if not tuned. Proposed solution: We propose to drop auto_restart_interval in favor of health_check_interval, which will be used for both disconnected -> connected and connected -> {disconnected, connecting} transition checks. With that, the resource will attempt to reconnect at the same interval as the health check, which currently is 15 s. Also, as two smaller changes to accompany this one: - Increase the default request_timeout from 15 s to 45 s. - Rename request_timeout to request_ttl.	2023-06-01 11:20:06 -03:00
firest	232ef23a48	feat(rocketmq): refactored bridge to avoid leaking resources during crashes at creation	2023-06-01 18:49:45 +08:00
Thales Macedo Garitezi	a7f4f81c38	Merge pull request #10887 from thalesmg/fix-async-worker-down-buffer-worker-20230530-v50 fix: block buffer workers so they may retry requests	2023-05-30 17:39:18 -03:00
Andrew Mayorov	a2688325e5	Merge pull request #10754 from fix/EMQX-10056/mqtt feat(mqttconn): employ ecpool instead of single worker	2023-05-30 23:28:10 +03:00
Thales Macedo Garitezi	8c565abc84	test(cassandra): fix flaky test	2023-05-30 15:42:53 -03:00
Thales Macedo Garitezi	6be8ff378e	fix(buffer_worker): make buffer worker enter `blocked` state when async worker dies Fixes https://emqx.atlassian.net/browse/EMQX-10074 Otherwise, requests from those async workers, now retriable, might not be retried until the buffer worker blocks for other reasons, which might take a long time.	2023-05-30 15:34:22 -03:00
Andrew Mayorov	a5fc26736d	refactor(mqttconn): split ingress/egress into 2 separate pools Each with a more refined set of responsibilities, at the cost of slight code duplication. Also provide two different config fields for each pool size.	2023-05-30 17:21:44 +03:00
Thales Macedo Garitezi	75fcac9711	Merge pull request #10826 from thalesmg/test-partial-batch-expired-inflight-v50 test(buffer_worker): add assertion for inflight count after batch expiration	2023-05-30 09:05:59 -03:00
Zaiming (Stone) Shi	91cdc69976	Merge pull request #10867 from zmstone/0530-merge-release-50-to-master 0530 merge release 50 to master	2023-05-30 09:54:57 +02:00
Zaiming (Stone) Shi	9529919046	chore: bump app versions	2023-05-30 08:08:29 +02:00
Thales Macedo Garitezi	67e182e0c9	Merge pull request #10813 from thalesmg/refactor-kafka-on-stop-v50 feat(kafka): ensure allocated resources are removed on failures	2023-05-29 16:49:29 -03:00
Zaiming (Stone) Shi	36e268c933	chore: bump app versions	2023-05-26 16:05:37 +02:00
Zaiming (Stone) Shi	cc5b4d3748	Merge remote-tracking branch 'origin/release-50' into 0526-ci-delete-otp-24-from-standalone-app-test	2023-05-26 15:58:16 +02:00
Thales Macedo Garitezi	32e6213ce3	fix(resource_manager_sup): use `one_for_one` instead of `simple_one_for_one` Using `simple_one_for_one` has a potential race condition issue where we read the PID of the resource manager before trying to remove a resource, and then that PID changes because it was either dead at first, or it crashed and changed, and later we use this stale PID to try to remove it from the supervisor. Under such circumstances, the restarting child might linger in the supervisor, leaking resources. By using the resource ID itself as a child ID (and using `one_for_one` restart strategy), we ensure the child is truly removed.	2023-05-25 18:07:43 -03:00
Thales Macedo Garitezi	42b37690c7	refactor(pulsar): use macros for allocatable resources	2023-05-25 16:38:09 -03:00
Thales Macedo Garitezi	db60dcbada	test(buffer_worker): add assertion for inflight count after batch expiration Fixes https://emqx.atlassian.net/browse/EMQX-9829	2023-05-25 16:11:37 -03:00
Thales Macedo Garitezi	18d57ba3eb	Merge pull request #10812 from thalesmg/test-flakiness-20230524 test: attempts to reduce flakiness (pgsql, cassandra)	2023-05-25 09:29:13 -03:00
JianBo He	de7f1c8aec	test: add tests for auto_restart_interval	2023-05-25 17:15:19 +08:00
JianBo He	71b636e321	fix: fix auto_restart_interval checker	2023-05-25 12:04:23 +08:00
Paulo Zulato	122ebcac24	fix: add user-friendly message when interval is out of range	2023-05-24 15:46:00 -03:00
Thales Macedo Garitezi	7f88521836	test(pgsql): reduce flakiness Depending on timing, `t_write_timeout` was getting stuck while checking the resource health, and the previous request timeout options were making a response to never be sent if that process took too long.	2023-05-24 15:41:25 -03:00
Thales Macedo Garitezi	fd2940cd77	feat(pulsar): ensure allocated resources are removed on failures (v5.0) Fixes https://emqx.atlassian.net/browse/EMQX-9937	2023-05-24 12:29:00 -03:00
Zaiming (Stone) Shi	732a7be187	Merge remote-tracking branch 'origin/release-50'	2023-05-22 17:46:54 +02:00
Thales Macedo Garitezi	0559d6f639	refactor(buffer_worker): use static fn for bumping counters	2023-05-22 09:12:08 -03:00
Thales Macedo Garitezi	c74c93388e	refactor: rename some variables and sum type constructors for clarity	2023-05-22 09:11:23 -03:00
Thales Macedo Garitezi	7d798c10e9	perf(buffer_worker): flush metrics periodically inside buffer worker process Fixes https://emqx.atlassian.net/browse/EMQX-9905 Since calling `telemetry` is costly in a hot path, we instead collect metrics inside the buffer workers state and periodically flush them, rather than immediately as events happen.	2023-05-22 09:11:23 -03:00
Andrew Mayorov	ba6b208df2	fix(clickhouse): start app in tests Otherwise, depending on the test execution order, tests might sometimes fail. Moreover, ensure that applications describe their dependecies correctly and avoid starting irrelevant apps in tests.	2023-05-19 23:08:40 +03:00
Zaiming (Stone) Shi	cb76e5a241	docs: add changelog for 10755	2023-05-19 20:41:26 +02:00
Zaiming (Stone) Shi	0d8ffc0d59	fix(resource-manager): ensure no false creation Update is implemented as remove + create. If a dleete call is made while the create is in progress the remove call is likely to timeout too. This causes the follwing creation to falsely succeed, because there is alreay a running child under the supervisor. As a result, the resource is permanently removed after resource_manager eventually handles the remove call.	2023-05-19 18:55:16 +02:00
Zaiming (Stone) Shi	f5e5c59763	refactor(resource-manager-sup): do not force kill resource manager the shutdown timeout is now set to infinity so it will never force kill a resource manager, otherwise there will be resource leaks	2023-05-19 18:55:16 +02:00
Zaiming (Stone) Shi	21de0f8274	fix(buffer-worker-sup): fast stop the timeout shutdown in child spec may significantly slow down the deletion of a resource this commit chagnes the shutdown to brutal kill also, the pool worker removal code has been delete because it's not necessary since the entier pool is going to be force-delete later anyway	2023-05-19 18:55:16 +02:00
firest	baeb96a6e4	chore: update changes	2023-05-19 15:36:18 +08:00
firest	0eea8438bf	fix(resource): make some logging of the resource manager more secure	2023-05-19 15:28:19 +08:00
Paulo Zulato	5d289ade56	fix: validate range for some bridge options Fixes https://emqx.atlassian.net/browse/EMQX-9864 Setting a very large interval can cause `erlang:start_timer` to crash. Also, setting auto_restart_interval or health_check_interval to "0s" causes the state machine to be in loop as time 0 is handled separately: \| state_timeout() = timeout() \| integer() \| (...) \| If Time is relative and 0 no timer is actually started, instead the the \| time-out event is enqueued to ensure that it gets processed before any \| not yet received external event. from "https://www.erlang.org/doc/man/gen_statem.html#type-state_timeout" Therefore, both fields are now validated against the range [1ms, 1h], which doesn't cause above issues.	2023-05-18 10:10:58 -03:00

1 2 3 4 5 ...

475 Commits