yuanbiao/emqx - emqx

Commit Graph

Author	SHA1	Message	Date
Zaiming (Stone) Shi	3261a12140	fix(emqx_resource): do not allow leading _ or - as resource name	2023-11-22 10:58:54 +01:00
Thales Macedo Garitezi	9e1796ec4f	feat(gcp_pubsub_producer): migrate GCP PubSub producer to actions Fixes https://emqx.atlassian.net/browse/EMQX-11157	2023-11-21 14:22:42 -03:00
Thales Macedo Garitezi	b92821188b	fix(kafka_producer): make status `connecting` while the client fails to connect Fixes https://emqx.atlassian.net/browse/EMQX-11408 To make it consistent with the previous bridge behavior. Also, introduces macros for resource status to avoid problems with typos.	2023-11-16 14:50:23 -03:00
Thales Macedo Garitezi	d6e9bbb95c	fix(connector): validate connector name before converting ssl certs Fixes https://emqx.atlassian.net/browse/EMQX-11336 See also: https://github.com/emqx/emqx/pull/11540	2023-11-14 09:29:59 -03:00
Ivan Dyachkov	a49aea3b56	chore: bump app versions	2023-11-14 09:27:04 +01:00
Kjell Winblad	cd5b1f9b96	docs(bridge_V2): type specs for operations	2023-11-14 09:20:46 +01:00
Thales Macedo Garitezi	b255836cbd	Merge pull request #11890 from thalesmg/fix-kafka-unhealthy-r53-20231106 fix(resource): take error from action/connector before attempting query	2023-11-07 12:38:57 -03:00
Thales Macedo Garitezi	7dcdbc9e51	fix(resource): take error from action/connector before attempting query Fixes https://emqx.atlassian.net/browse/EMQX-11284 Fixes https://emqx.atlassian.net/browse/EMQX-11298	2023-11-07 10:04:04 -03:00
Thales Macedo Garitezi	2b8cf50a1d	chore: rename `bridges_v2` -> `actions` in the public facing APIs Fixes https://emqx.atlassian.net/browse/EMQX-11330 After feedback from Product team, we should rename `bridges_v2` to `actions` everywhere. We'll start with the public facing APIs. - HTTP API - Hocon schema root key	2023-11-06 15:37:07 -03:00
Zaiming (Stone) Shi	600747b7e5	fix(bridge): do not allow dot in bridge name also validate name at the API entry	2023-11-03 20:44:57 +01:00
Kjell Winblad	357b664c8d	fix(bridge_v2): more fixes thanks to PR comments from @thalesmg	2023-11-01 15:27:54 +01:00
Kjell Winblad	96d6c6db49	test(bridge_v2): emqx_bridge_v2_kafka_producer_SUITE fix after API change	2023-11-01 15:27:53 +01:00
Kjell Winblad	edb1d37e67	chore(bridge_v2): make fixes thanks to PR comments from @thalesmg	2023-11-01 15:27:53 +01:00
Kjell Winblad	95f3b94ac3	fix(bridge_v2): channels should not be removed when status is connecting This fixes so that channels are not removed from the resource state when their status is connecting. This is needed for Kafka since Kafka's message buffer is stored in the resource state. Fixes: https://emqx.atlassian.net/browse/EMQX-11270	2023-11-01 15:27:53 +01:00
Kjell Winblad	9dc3a169b3	feat: split bridges into a connector part and a bridge part Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com> Co-authored-by: Stefan Strigler <stefan.strigler@emqx.io> Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com> Several bridges should be able to share a connector pool defined by a single connector. The connectors should be possible to enable and disable similar to how one can disable and enable bridges. There should also be an API for checking the status of a connector and for add/edit/delete connectors similar to the current bridge API. Issues: https://emqx.atlassian.net/browse/EMQX-10805	2023-10-30 14:48:47 +01:00
Thales Macedo Garitezi	cf2075d7d8	chore: remove mention of `is_buffer_supported` from typespec	2023-10-10 09:49:18 -03:00
Thales Macedo Garitezi	d6781efee2	fix(resource): change how buffer workers are started	2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi	902b1d6ec5	fix(pulsar_producer): use `simple_async_internal_buffer` query mode for Pulsar Since it has internal buffering, it necessitates the same fix as Kafka producer.	2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi	eebfb44f72	fix(resource): create `simple_async_internal_buffer` query mode for bridges with internal buffering Since authn/authz backends also use simple async/sync queries, we may want to avoid them calling the connector when it's not connected.	2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi	79cf0a2ced	fix(kafka_producer): correctly handle metrics for connector that have internal buffers Fixes https://emqx.atlassian.net/browse/EMQX-11086 There’s currently a metric inconsistency due to the internal buffering nature of Kafka Producer (wolff). We use simple_sync_query to call the Kafka Producer bridge. If that times out, the call is accounted as failed, even though the message is buffered in wolff and later sent successfully.	2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi	34186fcc74	fix(kafka_producer): send messages to wolff producer to buffer even when connector is in `connecting` state Fixes https://emqx.atlassian.net/browse/EMQX-11085 Messages would not be sent to wolff if the connection was down, so they were effectively lost.	2023-10-06 11:43:29 -03:00
Zaiming (Stone) Shi	c2d750aa09	fix(resource): redact query args in exception log	2023-09-29 09:20:42 +02:00
firest	dca8fdb17f	fix(resource): respect the start_timeout	2023-09-28 16:36:41 +08:00
Ivan Dyachkov	dafd7c6085	chore: bump apps versions	2023-09-21 10:58:42 +02:00
zhongwencool	123d31fa7d	Merge pull request #11640 from zhongwencool/ensure-destory-resource fix: always return ok when remove local resource	2023-09-21 09:21:45 +08:00
zhongwencool	dd687d9582	fix: dialyzer warning	2023-09-20 22:41:26 +08:00
zhongwencool	2f1fa2e961	chore: unified slog message formatting to improve logging consistency	2023-09-20 18:13:00 +08:00
zhongwencool	c26a18e949	fix: always return ok when remove local resource	2023-09-20 18:02:42 +08:00
Paulo Zulato	dfcede8794	fix: increment matched counter when bridge is unhealthy Fixes https://emqx.atlassian.net/browse/EMQX-10767	2023-08-30 10:52:53 -03:00
Paulo Zulato	cc3ba18734	fix: increment dropped message counter when bridge is unhealthy Fixes https://emqx.atlassian.net/browse/EMQX-10767	2023-08-28 19:47:11 -03:00
Paulo Zulato	42877e282d	fix: flatten error message on resource validator Fixes https://emqx.atlassian.net/browse/EMQX-10864	2023-08-25 13:53:52 -03:00
Thales Macedo Garitezi	ebecbd1545	fix(bridge): make dryrun health check timeout more malleable Fixes https://emqx.atlassian.net/browse/EMQX-10773 - Makes the timeout for probing a bridge more malleable to account for differences between each database. - Increases GCP PubSub Consumer default health check timeout to account for GCP slowness/throttling.	2023-08-17 09:21:19 -03:00
Ivan Dyachkov	cbfca8c043	chore: merge master into release-51	2023-07-27 15:19:57 +02:00
Thales Macedo Garitezi	3b1e436d3f	refactor: use `emqx_pool:async_submit` to avoid excessive spawning	2023-07-20 10:17:43 -03:00
Thales Macedo Garitezi	eb41b77de4	fix(rule_metrics): notify rule metrics of late replies and expired requests Fixes https://emqx.atlassian.net/browse/EMQX-10600	2023-07-19 11:39:28 -03:00
Thales Macedo Garitezi	01b143c5ad	fix(resource): don't destruct error tuple Otherwise, `emqx_resource:query` won't correctly deem the resource to be unhealthy when there's an extra message.	2023-07-13 16:12:33 -03:00
zhongwencool	b5cc8fb3c3	fix: start_after_created's default value	2023-07-07 16:39:26 +08:00
Stefan Strigler	07cf250093	Merge pull request #11126 from sstrigler/EMQX-8842-fix-rule-metrics fix(emqx_rule_engine): set inc_action_metrics as async_reply_fun	2023-06-30 20:07:23 +02:00
Stefan Strigler	321fd53132	fix: use ReplyTo in QUERY for async	2023-06-29 16:09:45 +02:00
Stefan Strigler	40dd34a704	fix: use reply_to instead of async_reply_fun	2023-06-29 16:09:45 +02:00
Stefan Strigler	1363108678	fix: fix simple_sync_query	2023-06-29 16:09:45 +02:00
Stefan Strigler	2274a192cc	fix(emqx_resource): call async reply fun in simple_aysnc_query	2023-06-29 16:09:45 +02:00
Stefan Strigler	ae636a52d7	fix(emqx_rule_engine): set inc_action_metrics as async_reply_fun	2023-06-29 16:09:45 +02:00
Thales Macedo Garitezi	30e0b4be54	test(gcp_pubsub_consumer): add more tests and improve bridge Fixes https://emqx.atlassian.net/browse/EMQX-10309	2023-06-28 14:08:40 -03:00
Thales Macedo Garitezi	c4fc0e767e	feat: allow specifying more helpful messages for unhealthy targets	2023-06-27 17:13:43 -03:00
Thales Macedo Garitezi	2f00cf7f84	Merge pull request #11107 from thalesmg/fix-mongo-health-check-reason-master fix(mongo): return health check failure reason	2023-06-22 09:30:34 -03:00
Thales Macedo Garitezi	7ef03d9e1f	Merge pull request #11090 from thalesmg/gcp-pubsub-consumer feat(gcp_pubsub_consumer): implement GCP PubSub Consumer bridge	2023-06-22 09:17:45 -03:00
Zaiming (Stone) Shi	c58a98954b	Merge remote-tracking branch 'origin/master' into 0621-merge-release-51-to-master	2023-06-22 11:05:51 +02:00
Paulo Zulato	62d3766726	Merge pull request #10645 from paulozulato/data-bridge-target-unavailable Data bridge target unavailable	2023-06-21 18:19:23 -03:00
Andrew Mayorov	62b832be45	Merge pull request #11118 from fix/EMQX-9964/bump-hocon chore: bump hocon to 0.39.10	2023-06-21 23:13:35 +02:00
Andrew Mayorov	86d787eced	chore: bump hocon to 0.39.10 Which comes with a fix for slightly more user-friendly validation error messages.	2023-06-21 21:25:43 +02:00
Thales Macedo Garitezi	18f0510353	fix(mongo): return health check failure reason Fixes https://emqx.atlassian.net/browse/EMQX-10335	2023-06-21 15:09:37 -03:00
Thales Macedo Garitezi	decfd6df2b	feat(buffer_worker): log expired message count Fixes https://emqx.atlassian.net/browse/EMQX-10165 ``` iex(emqx@127.0.0.1)38> 2023-06-21T11:09:35.569404-03:00 [info] msg: buffer_worker_dropped_expired_messages, mfa: emqx_resource_buffer_worker:log_expired_messge_count/1, line: 982, expired_count: 900, resource_id: <<"bridge:webhook:webhook">>, worker_index: 3 ```	2023-06-21 14:38:51 -03:00
Zaiming (Stone) Shi	7cf8a6c892	chore: bump app vsns	2023-06-21 16:36:51 +02:00
Thales Macedo Garitezi	1d791d7a8c	fix(resource): validate maximum worker pool size Fixes https://emqx.atlassian.net/browse/EMQX-10297	2023-06-20 14:26:42 -03:00
Thales Macedo Garitezi	13746c2cdf	fix(resource): check status when (re)starting a resource Fixes https://emqx.atlassian.net/browse/EMQX-10290	2023-06-19 18:01:02 -03:00
Thales Macedo Garitezi	7f850f7499	fix(resource): fix `query_mode/0` type and usage	2023-06-19 15:59:00 -03:00
Paulo Zulato	9454af9a8b	feat(postgresql): check whether target table exists Fixes https://emqx.atlassian.net/browse/EMQX-9026	2023-06-19 11:12:10 -03:00
Zaiming (Stone) Shi	e8ccdb8d0f	Merge pull request #10998 from zmstone/0609-no-batch-for-mongodb fix(mongodb): hide batch_size for mongodb resource	2023-06-11 21:26:12 +02:00
Zaiming (Stone) Shi	ddef751527	fix(mongodb): hide batch_size for mongodb resource MongoDB connector currently does not support batching so the batch_size option has no effect. However we cannot remove the field, so we choose to hide it from schema	2023-06-11 11:08:58 +02:00
Kjell Winblad	4215da12f0	Merge pull request #10970 from kjellwinblad/kjell/feat/kafka_add_async_param/EMQX-8631 feat: add sync/async option to the Kafka producer bridge	2023-06-09 12:51:23 +02:00
Kjell Winblad	2671e8ecf9	fix: dialyzer type problem	2023-06-09 11:00:05 +02:00
firest	86a7b2d69a	fix(resource): improve log security when resource creation fails	2023-06-09 11:43:42 +08:00
Kjell Winblad	cb3a5fdbd4	style: only callback modules should do dynamic calls	2023-06-08 16:27:04 +02:00
Kjell Winblad	ed9e29e769	refactor: refacor query_mode detection code This commit refactor the query_mode resource detection code according to a suggestion from @zmstone. This commit should not contain any functional change except for a change of the Kafka producer bridge config.	2023-06-08 16:26:55 +02:00
Kjell Winblad	47fa17b3c1	feat: add sync/async option to the Kafka producer bridge This commit makes it possible to configure if a Kafka bridge should work in query mode sync or async by setting the resource_opts.query_mode configuration option. Fixes: https://emqx.atlassian.net/browse/EMQX-8631	2023-06-08 13:16:06 +02:00
Thales Macedo Garitezi	cc8631223e	fix(schema): avoid `function_clause` error when compacting errors Fixes https://emqx.atlassian.net/browse/EMQX-10168 The bridge probe API displayed the typecheck errors for the new timeout duration types correctly, but when an user tried to create the bridge anyway a `function_clause` error was raised when trying to compact hocon errors: ``` 09:47:19.045 [warning] [exception: :error, path: '/bridges', reason: {:case_clause, {:error, {:config_update_crashed, :function_clause}}}, stacktrace: [{:emqx_bridge_api, :create_or_update_bridge, 4, [file: '/home/thales/dev/emqx/emqx/apps/emqx_bridge/src/emqx_bridge_api.erl', line: 602]}, {:minirest_handler, :apply_callback, 3, [file: '/home/thales/dev/emqx/emqx/deps/minirest/src/minirest_handler.erl', line: 111]}, {:minirest_handler, :handle, 2, [file: '/home/thales/dev/emqx/emqx/deps/minirest/src/minirest_handler.erl', line: 44]}, {:minirest_handler, :init, 2, [file: '/home/thales/dev/emqx/emqx/deps/minirest/src/minirest_handler.erl', line: 27]}, {:cowboy_handler, :execute, 2, [file: '/home/thales/dev/emqx/emqx/deps/cowboy/src/cowboy_handler.erl', line: 41]}, {:cowboy_stream_h, :execute, 3, [file: '/home/thales/dev/emqx/emqx/deps/cowboy/src/cowboy_stream_h.erl', line: 318]}, {:cowboy_stream_h, :request_process, 3, [file: '/home/thales/dev/emqx/emqx/deps/cowboy/src/cowboy_stream_h.erl', line: 302]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]] ``` This fixes the issue so that both APIs return more friendly error messages.	2023-06-07 10:34:58 -03:00
Thales Macedo Garitezi	75df622426	Merge pull request #10930 from thalesmg/max-int-timeout-v50 check maximum timeout values in schema (5.1)	2023-06-05 12:51:36 -03:00
Thales Macedo Garitezi	46393343e2	chore: use `timeout_duration` types for timer fields Fixes https://emqx.atlassian.net/browse/EMQX-10020	2023-06-05 11:46:38 -03:00
lafirest	d51c658a30	Merge pull request #10908 from lafirest/feat/rocketmq_on_stop feat(rocketmq): refactored bridge to avoid leaking resources during crashes at creation	2023-06-05 15:00:32 +08:00
firest	5921ed3d2e	fix(rocketmq): improve function name	2023-06-05 10:43:28 +08:00
Thales Macedo Garitezi	0790c88aaf	refactor: use default's type as first union member Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>	2023-06-02 09:08:11 -03:00
Thales Macedo Garitezi	99796224d8	refactor(resource): rename `request_timeout` -> `request_ttl` See https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options	2023-06-01 13:01:53 -03:00
Thales Macedo Garitezi	f42ccb6262	feat(resource): increase default request timeout to 45 s See https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options	2023-06-01 11:20:06 -03:00
Thales Macedo Garitezi	10425eb925	feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval` See: https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options Current problem: In 5.0.x, we have two timer options that control the state changing of buffer worker resources: auto_restart_interval and health_check_interval. - auto_restart_interval controls how often the resource attempts to transition from disconnected to connected. - health_check_interval controls how often the resource is checked and potentially moved from connected to disconnected or connecting. The existence of two independent timers for very similar purposes is confusing to users, QA and even developers. Also, an intimately related configuration is request_timeout, which can interact badly with auto_restart_interval if the latter is poorly configured: requests may always expire if request_timeout < auto_restart_interval and if the resource enters the disconnected state. For health_check_interval, we attempt to derive a sane default that gives requests a chance to retry (if request timeout is finite, then the resource retries requests with a period of min(health_check_interval, request_timeout / 3). Another problem with the separate auto_restart_interval is that its default value (60 s) is too high when compared to the default request timeout and health check, leading to the problems described above if not tuned. Proposed solution: We propose to drop auto_restart_interval in favor of health_check_interval, which will be used for both disconnected -> connected and connected -> {disconnected, connecting} transition checks. With that, the resource will attempt to reconnect at the same interval as the health check, which currently is 15 s. Also, as two smaller changes to accompany this one: - Increase the default request_timeout from 15 s to 45 s. - Rename request_timeout to request_ttl.	2023-06-01 11:20:06 -03:00
firest	232ef23a48	feat(rocketmq): refactored bridge to avoid leaking resources during crashes at creation	2023-06-01 18:49:45 +08:00
Thales Macedo Garitezi	a7f4f81c38	Merge pull request #10887 from thalesmg/fix-async-worker-down-buffer-worker-20230530-v50 fix: block buffer workers so they may retry requests	2023-05-30 17:39:18 -03:00
Andrew Mayorov	a2688325e5	Merge pull request #10754 from fix/EMQX-10056/mqtt feat(mqttconn): employ ecpool instead of single worker	2023-05-30 23:28:10 +03:00
Thales Macedo Garitezi	8c565abc84	test(cassandra): fix flaky test	2023-05-30 15:42:53 -03:00
Thales Macedo Garitezi	6be8ff378e	fix(buffer_worker): make buffer worker enter `blocked` state when async worker dies Fixes https://emqx.atlassian.net/browse/EMQX-10074 Otherwise, requests from those async workers, now retriable, might not be retried until the buffer worker blocks for other reasons, which might take a long time.	2023-05-30 15:34:22 -03:00
Andrew Mayorov	a5fc26736d	refactor(mqttconn): split ingress/egress into 2 separate pools Each with a more refined set of responsibilities, at the cost of slight code duplication. Also provide two different config fields for each pool size.	2023-05-30 17:21:44 +03:00
Thales Macedo Garitezi	75fcac9711	Merge pull request #10826 from thalesmg/test-partial-batch-expired-inflight-v50 test(buffer_worker): add assertion for inflight count after batch expiration	2023-05-30 09:05:59 -03:00
Zaiming (Stone) Shi	91cdc69976	Merge pull request #10867 from zmstone/0530-merge-release-50-to-master 0530 merge release 50 to master	2023-05-30 09:54:57 +02:00
Zaiming (Stone) Shi	9529919046	chore: bump app versions	2023-05-30 08:08:29 +02:00
Thales Macedo Garitezi	67e182e0c9	Merge pull request #10813 from thalesmg/refactor-kafka-on-stop-v50 feat(kafka): ensure allocated resources are removed on failures	2023-05-29 16:49:29 -03:00
Zaiming (Stone) Shi	36e268c933	chore: bump app versions	2023-05-26 16:05:37 +02:00
Zaiming (Stone) Shi	cc5b4d3748	Merge remote-tracking branch 'origin/release-50' into 0526-ci-delete-otp-24-from-standalone-app-test	2023-05-26 15:58:16 +02:00
Thales Macedo Garitezi	32e6213ce3	fix(resource_manager_sup): use `one_for_one` instead of `simple_one_for_one` Using `simple_one_for_one` has a potential race condition issue where we read the PID of the resource manager before trying to remove a resource, and then that PID changes because it was either dead at first, or it crashed and changed, and later we use this stale PID to try to remove it from the supervisor. Under such circumstances, the restarting child might linger in the supervisor, leaking resources. By using the resource ID itself as a child ID (and using `one_for_one` restart strategy), we ensure the child is truly removed.	2023-05-25 18:07:43 -03:00
Thales Macedo Garitezi	42b37690c7	refactor(pulsar): use macros for allocatable resources	2023-05-25 16:38:09 -03:00
Thales Macedo Garitezi	db60dcbada	test(buffer_worker): add assertion for inflight count after batch expiration Fixes https://emqx.atlassian.net/browse/EMQX-9829	2023-05-25 16:11:37 -03:00
Thales Macedo Garitezi	18d57ba3eb	Merge pull request #10812 from thalesmg/test-flakiness-20230524 test: attempts to reduce flakiness (pgsql, cassandra)	2023-05-25 09:29:13 -03:00
JianBo He	de7f1c8aec	test: add tests for auto_restart_interval	2023-05-25 17:15:19 +08:00
JianBo He	71b636e321	fix: fix auto_restart_interval checker	2023-05-25 12:04:23 +08:00
Paulo Zulato	122ebcac24	fix: add user-friendly message when interval is out of range	2023-05-24 15:46:00 -03:00
Thales Macedo Garitezi	7f88521836	test(pgsql): reduce flakiness Depending on timing, `t_write_timeout` was getting stuck while checking the resource health, and the previous request timeout options were making a response to never be sent if that process took too long.	2023-05-24 15:41:25 -03:00
Thales Macedo Garitezi	fd2940cd77	feat(pulsar): ensure allocated resources are removed on failures (v5.0) Fixes https://emqx.atlassian.net/browse/EMQX-9937	2023-05-24 12:29:00 -03:00
Zaiming (Stone) Shi	732a7be187	Merge remote-tracking branch 'origin/release-50'	2023-05-22 17:46:54 +02:00
Thales Macedo Garitezi	0559d6f639	refactor(buffer_worker): use static fn for bumping counters	2023-05-22 09:12:08 -03:00
Thales Macedo Garitezi	c74c93388e	refactor: rename some variables and sum type constructors for clarity	2023-05-22 09:11:23 -03:00
Thales Macedo Garitezi	7d798c10e9	perf(buffer_worker): flush metrics periodically inside buffer worker process Fixes https://emqx.atlassian.net/browse/EMQX-9905 Since calling `telemetry` is costly in a hot path, we instead collect metrics inside the buffer workers state and periodically flush them, rather than immediately as events happen.	2023-05-22 09:11:23 -03:00

1 2 3 4 5 ...

583 Commits