Zaiming (Stone) Shi
3261a12140
fix(emqx_resource): do not allow leading _ or - as resource name
2023-11-22 10:58:54 +01:00
Thales Macedo Garitezi
9e1796ec4f
feat(gcp_pubsub_producer): migrate GCP PubSub producer to actions
...
Fixes https://emqx.atlassian.net/browse/EMQX-11157
2023-11-21 14:22:42 -03:00
Thales Macedo Garitezi
b92821188b
fix(kafka_producer): make status `connecting` while the client fails to connect
...
Fixes https://emqx.atlassian.net/browse/EMQX-11408
To make it consistent with the previous bridge behavior.
Also, introduces macros for resource status to avoid problems with typos.
2023-11-16 14:50:23 -03:00
Thales Macedo Garitezi
d6e9bbb95c
fix(connector): validate connector name before converting ssl certs
...
Fixes https://emqx.atlassian.net/browse/EMQX-11336
See also: https://github.com/emqx/emqx/pull/11540
2023-11-14 09:29:59 -03:00
Ivan Dyachkov
a49aea3b56
chore: bump app versions
2023-11-14 09:27:04 +01:00
Kjell Winblad
cd5b1f9b96
docs(bridge_V2): type specs for operations
2023-11-14 09:20:46 +01:00
Thales Macedo Garitezi
b255836cbd
Merge pull request #11890 from thalesmg/fix-kafka-unhealthy-r53-20231106
...
fix(resource): take error from action/connector before attempting query
2023-11-07 12:38:57 -03:00
Thales Macedo Garitezi
7dcdbc9e51
fix(resource): take error from action/connector before attempting query
...
Fixes https://emqx.atlassian.net/browse/EMQX-11284
Fixes https://emqx.atlassian.net/browse/EMQX-11298
2023-11-07 10:04:04 -03:00
Thales Macedo Garitezi
2b8cf50a1d
chore: rename `bridges_v2` -> `actions` in the public facing APIs
...
Fixes https://emqx.atlassian.net/browse/EMQX-11330
After feedback from Product team, we should rename `bridges_v2` to `actions` everywhere.
We'll start with the public facing APIs.
- HTTP API
- Hocon schema root key
2023-11-06 15:37:07 -03:00
Zaiming (Stone) Shi
600747b7e5
fix(bridge): do not allow dot in bridge name
...
also validate name at the API entry
2023-11-03 20:44:57 +01:00
Kjell Winblad
357b664c8d
fix(bridge_v2): more fixes thanks to PR comments from @thalesmg
2023-11-01 15:27:54 +01:00
Kjell Winblad
96d6c6db49
test(bridge_v2): emqx_bridge_v2_kafka_producer_SUITE fix after API change
2023-11-01 15:27:53 +01:00
Kjell Winblad
edb1d37e67
chore(bridge_v2): make fixes thanks to PR comments from @thalesmg
2023-11-01 15:27:53 +01:00
Kjell Winblad
95f3b94ac3
fix(bridge_v2): channels should not be removed when status is connecting
...
This fixes so that channels are not removed from the resource state when
their status is connecting. This is needed for Kafka since Kafka's message
buffer is stored in the resource state.
Fixes:
https://emqx.atlassian.net/browse/EMQX-11270
2023-11-01 15:27:53 +01:00
Kjell Winblad
9dc3a169b3
feat: split bridges into a connector part and a bridge part
...
Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>
Co-authored-by: Stefan Strigler <stefan.strigler@emqx.io>
Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>
Several bridges should be able to share a connector pool defined by a
single connector. The connectors should be possible to enable and
disable similar to how one can disable and enable bridges. There should
also be an API for checking the status of a connector and for
add/edit/delete connectors similar to the current bridge API.
Issues:
https://emqx.atlassian.net/browse/EMQX-10805
2023-10-30 14:48:47 +01:00
Thales Macedo Garitezi
cf2075d7d8
chore: remove mention of `is_buffer_supported` from typespec
2023-10-10 09:49:18 -03:00
Thales Macedo Garitezi
d6781efee2
fix(resource): change how buffer workers are started
2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi
902b1d6ec5
fix(pulsar_producer): use `simple_async_internal_buffer` query mode for Pulsar
...
Since it has internal buffering, it necessitates the same fix as Kafka producer.
2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi
eebfb44f72
fix(resource): create `simple_async_internal_buffer` query mode for bridges with internal buffering
...
Since authn/authz backends also use simple async/sync queries, we may want to avoid them
calling the connector when it's not connected.
2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi
79cf0a2ced
fix(kafka_producer): correctly handle metrics for connector that have internal buffers
...
Fixes https://emqx.atlassian.net/browse/EMQX-11086
There’s currently a metric inconsistency due to the internal buffering nature of Kafka
Producer (wolff).
We use simple_sync_query to call the Kafka Producer bridge. If that times out, the call
is accounted as failed, even though the message is buffered in wolff and later sent
successfully.
2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi
34186fcc74
fix(kafka_producer): send messages to wolff producer to buffer even when connector is in `connecting` state
...
Fixes https://emqx.atlassian.net/browse/EMQX-11085
Messages would not be sent to wolff if the connection was down, so they were effectively lost.
2023-10-06 11:43:29 -03:00
Zaiming (Stone) Shi
c2d750aa09
fix(resource): redact query args in exception log
2023-09-29 09:20:42 +02:00
firest
dca8fdb17f
fix(resource): respect the start_timeout
2023-09-28 16:36:41 +08:00
Ivan Dyachkov
dafd7c6085
chore: bump apps versions
2023-09-21 10:58:42 +02:00
zhongwencool
123d31fa7d
Merge pull request #11640 from zhongwencool/ensure-destory-resource
...
fix: always return ok when remove local resource
2023-09-21 09:21:45 +08:00
zhongwencool
dd687d9582
fix: dialyzer warning
2023-09-20 22:41:26 +08:00
zhongwencool
2f1fa2e961
chore: unified slog message formatting to improve logging consistency
2023-09-20 18:13:00 +08:00
zhongwencool
c26a18e949
fix: always return ok when remove local resource
2023-09-20 18:02:42 +08:00
Paulo Zulato
dfcede8794
fix: increment matched counter when bridge is unhealthy
...
Fixes https://emqx.atlassian.net/browse/EMQX-10767
2023-08-30 10:52:53 -03:00
Paulo Zulato
cc3ba18734
fix: increment dropped message counter when bridge is unhealthy
...
Fixes https://emqx.atlassian.net/browse/EMQX-10767
2023-08-28 19:47:11 -03:00
Paulo Zulato
42877e282d
fix: flatten error message on resource validator
...
Fixes https://emqx.atlassian.net/browse/EMQX-10864
2023-08-25 13:53:52 -03:00
Thales Macedo Garitezi
ebecbd1545
fix(bridge): make dryrun health check timeout more malleable
...
Fixes https://emqx.atlassian.net/browse/EMQX-10773
- Makes the timeout for probing a bridge more malleable to account for differences between
each database.
- Increases GCP PubSub Consumer default health check timeout to account for GCP
slowness/throttling.
2023-08-17 09:21:19 -03:00
Ivan Dyachkov
cbfca8c043
chore: merge master into release-51
2023-07-27 15:19:57 +02:00
Thales Macedo Garitezi
3b1e436d3f
refactor: use `emqx_pool:async_submit` to avoid excessive spawning
2023-07-20 10:17:43 -03:00
Thales Macedo Garitezi
eb41b77de4
fix(rule_metrics): notify rule metrics of late replies and expired requests
...
Fixes https://emqx.atlassian.net/browse/EMQX-10600
2023-07-19 11:39:28 -03:00
Thales Macedo Garitezi
01b143c5ad
fix(resource): don't destruct error tuple
...
Otherwise, `emqx_resource:query` won't correctly deem the resource to
be unhealthy when there's an extra message.
2023-07-13 16:12:33 -03:00
zhongwencool
b5cc8fb3c3
fix: start_after_created's default value
2023-07-07 16:39:26 +08:00
Stefan Strigler
07cf250093
Merge pull request #11126 from sstrigler/EMQX-8842-fix-rule-metrics
...
fix(emqx_rule_engine): set inc_action_metrics as async_reply_fun
2023-06-30 20:07:23 +02:00
Stefan Strigler
321fd53132
fix: use ReplyTo in QUERY for async
2023-06-29 16:09:45 +02:00
Stefan Strigler
40dd34a704
fix: use reply_to instead of async_reply_fun
2023-06-29 16:09:45 +02:00
Stefan Strigler
1363108678
fix: fix simple_sync_query
2023-06-29 16:09:45 +02:00
Stefan Strigler
2274a192cc
fix(emqx_resource): call async reply fun in simple_aysnc_query
2023-06-29 16:09:45 +02:00
Stefan Strigler
ae636a52d7
fix(emqx_rule_engine): set inc_action_metrics as async_reply_fun
2023-06-29 16:09:45 +02:00
Thales Macedo Garitezi
30e0b4be54
test(gcp_pubsub_consumer): add more tests and improve bridge
...
Fixes https://emqx.atlassian.net/browse/EMQX-10309
2023-06-28 14:08:40 -03:00
Thales Macedo Garitezi
c4fc0e767e
feat: allow specifying more helpful messages for unhealthy targets
2023-06-27 17:13:43 -03:00
Thales Macedo Garitezi
2f00cf7f84
Merge pull request #11107 from thalesmg/fix-mongo-health-check-reason-master
...
fix(mongo): return health check failure reason
2023-06-22 09:30:34 -03:00
Thales Macedo Garitezi
7ef03d9e1f
Merge pull request #11090 from thalesmg/gcp-pubsub-consumer
...
feat(gcp_pubsub_consumer): implement GCP PubSub Consumer bridge
2023-06-22 09:17:45 -03:00
Zaiming (Stone) Shi
c58a98954b
Merge remote-tracking branch 'origin/master' into 0621-merge-release-51-to-master
2023-06-22 11:05:51 +02:00
Paulo Zulato
62d3766726
Merge pull request #10645 from paulozulato/data-bridge-target-unavailable
...
Data bridge target unavailable
2023-06-21 18:19:23 -03:00
Andrew Mayorov
62b832be45
Merge pull request #11118 from fix/EMQX-9964/bump-hocon
...
chore: bump hocon to 0.39.10
2023-06-21 23:13:35 +02:00
Andrew Mayorov
86d787eced
chore: bump hocon to 0.39.10
...
Which comes with a fix for slightly more user-friendly validation error
messages.
2023-06-21 21:25:43 +02:00
Thales Macedo Garitezi
18f0510353
fix(mongo): return health check failure reason
...
Fixes https://emqx.atlassian.net/browse/EMQX-10335
2023-06-21 15:09:37 -03:00
Thales Macedo Garitezi
decfd6df2b
feat(buffer_worker): log expired message count
...
Fixes https://emqx.atlassian.net/browse/EMQX-10165
```
iex(emqx@127.0.0.1)38> 2023-06-21T11:09:35.569404-03:00 [info] msg: buffer_worker_dropped_expired_messages, mfa: emqx_resource_buffer_worker:log_expired_messge_count/1, line: 982, expired_count: 900, resource_id: <<"bridge:webhook:webhook">>, worker_index: 3
```
2023-06-21 14:38:51 -03:00
Zaiming (Stone) Shi
7cf8a6c892
chore: bump app vsns
2023-06-21 16:36:51 +02:00
Thales Macedo Garitezi
1d791d7a8c
fix(resource): validate maximum worker pool size
...
Fixes https://emqx.atlassian.net/browse/EMQX-10297
2023-06-20 14:26:42 -03:00
Thales Macedo Garitezi
13746c2cdf
fix(resource): check status when (re)starting a resource
...
Fixes https://emqx.atlassian.net/browse/EMQX-10290
2023-06-19 18:01:02 -03:00
Thales Macedo Garitezi
7f850f7499
fix(resource): fix `query_mode/0` type and usage
2023-06-19 15:59:00 -03:00
Paulo Zulato
9454af9a8b
feat(postgresql): check whether target table exists
...
Fixes https://emqx.atlassian.net/browse/EMQX-9026
2023-06-19 11:12:10 -03:00
Zaiming (Stone) Shi
e8ccdb8d0f
Merge pull request #10998 from zmstone/0609-no-batch-for-mongodb
...
fix(mongodb): hide batch_size for mongodb resource
2023-06-11 21:26:12 +02:00
Zaiming (Stone) Shi
ddef751527
fix(mongodb): hide batch_size for mongodb resource
...
MongoDB connector currently does not support batching
so the batch_size option has no effect.
However we cannot remove the field, so we choose to hide it from
schema
2023-06-11 11:08:58 +02:00
Kjell Winblad
4215da12f0
Merge pull request #10970 from kjellwinblad/kjell/feat/kafka_add_async_param/EMQX-8631
...
feat: add sync/async option to the Kafka producer bridge
2023-06-09 12:51:23 +02:00
Kjell Winblad
2671e8ecf9
fix: dialyzer type problem
2023-06-09 11:00:05 +02:00
firest
86a7b2d69a
fix(resource): improve log security when resource creation fails
2023-06-09 11:43:42 +08:00
Kjell Winblad
cb3a5fdbd4
style: only callback modules should do dynamic calls
2023-06-08 16:27:04 +02:00
Kjell Winblad
ed9e29e769
refactor: refacor query_mode detection code
...
This commit refactor the query_mode resource detection code according to
a suggestion from @zmstone. This commit should not contain any
functional change except for a change of the Kafka producer bridge
config.
2023-06-08 16:26:55 +02:00
Kjell Winblad
47fa17b3c1
feat: add sync/async option to the Kafka producer bridge
...
This commit makes it possible to configure if a Kafka bridge should work
in query mode sync or async by setting the resource_opts.query_mode
configuration option.
Fixes:
https://emqx.atlassian.net/browse/EMQX-8631
2023-06-08 13:16:06 +02:00
Thales Macedo Garitezi
cc8631223e
fix(schema): avoid `function_clause` error when compacting errors
...
Fixes https://emqx.atlassian.net/browse/EMQX-10168
The bridge probe API displayed the typecheck errors for the new
timeout duration types correctly, but when an user tried to create the
bridge anyway a `function_clause` error was raised when trying to
compact hocon errors:
```
09:47:19.045 [warning] [exception: :error, path: '/bridges', reason: {:case_clause, {:error, {:config_update_crashed, :function_clause}}}, stacktrace: [{:emqx_bridge_api, :create_or_update_bridge, 4, [file: '/home/thales/dev/emqx/emqx/apps/emqx_bridge/src/emqx_bridge_api.erl', line: 602]}, {:minirest_handler, :apply_callback, 3, [file: '/home/thales/dev/emqx/emqx/deps/minirest/src/minirest_handler.erl', line: 111]}, {:minirest_handler, :handle, 2, [file: '/home/thales/dev/emqx/emqx/deps/minirest/src/minirest_handler.erl', line: 44]}, {:minirest_handler, :init, 2, [file: '/home/thales/dev/emqx/emqx/deps/minirest/src/minirest_handler.erl', line: 27]}, {:cowboy_handler, :execute, 2, [file: '/home/thales/dev/emqx/emqx/deps/cowboy/src/cowboy_handler.erl', line: 41]}, {:cowboy_stream_h, :execute, 3, [file: '/home/thales/dev/emqx/emqx/deps/cowboy/src/cowboy_stream_h.erl', line: 318]}, {:cowboy_stream_h, :request_process, 3, [file: '/home/thales/dev/emqx/emqx/deps/cowboy/src/cowboy_stream_h.erl', line: 302]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 240]}]]
```
This fixes the issue so that both APIs return more friendly error messages.
2023-06-07 10:34:58 -03:00
Thales Macedo Garitezi
75df622426
Merge pull request #10930 from thalesmg/max-int-timeout-v50
...
check maximum timeout values in schema (5.1)
2023-06-05 12:51:36 -03:00
Thales Macedo Garitezi
46393343e2
chore: use `timeout_duration` types for timer fields
...
Fixes https://emqx.atlassian.net/browse/EMQX-10020
2023-06-05 11:46:38 -03:00
lafirest
d51c658a30
Merge pull request #10908 from lafirest/feat/rocketmq_on_stop
...
feat(rocketmq): refactored bridge to avoid leaking resources during crashes at creation
2023-06-05 15:00:32 +08:00
firest
5921ed3d2e
fix(rocketmq): improve function name
2023-06-05 10:43:28 +08:00
Thales Macedo Garitezi
0790c88aaf
refactor: use default's type as first union member
...
Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>
2023-06-02 09:08:11 -03:00
Thales Macedo Garitezi
99796224d8
refactor(resource): rename `request_timeout` -> `request_ttl`
...
See
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options
2023-06-01 13:01:53 -03:00
Thales Macedo Garitezi
f42ccb6262
feat(resource): increase default request timeout to 45 s
...
See
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options
2023-06-01 11:20:06 -03:00
Thales Macedo Garitezi
10425eb925
feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval`
...
See:
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options
Current problem:
In 5.0.x, we have two timer options that control the state changing of buffer worker
resources: auto_restart_interval and health_check_interval.
- auto_restart_interval controls how often the resource attempts to transition from
disconnected to connected.
- health_check_interval controls how often the resource is checked and potentially moved
from connected to disconnected or connecting.
The existence of two independent timers for very similar purposes is confusing to users,
QA and even developers. Also, an intimately related configuration is request_timeout,
which can interact badly with auto_restart_interval if the latter is poorly configured:
requests may always expire if request_timeout < auto_restart_interval and if the resource
enters the disconnected state. For health_check_interval, we attempt to derive a sane
default that gives requests a chance to retry (if request timeout is finite, then the
resource retries requests with a period of min(health_check_interval, request_timeout /
3).
Another problem with the separate auto_restart_interval is that its default value (60 s)
is too high when compared to the default request timeout and health check, leading to the
problems described above if not tuned.
Proposed solution:
We propose to drop auto_restart_interval in favor of health_check_interval, which will be
used for both disconnected -> connected and connected -> {disconnected, connecting}
transition checks. With that, the resource will attempt to reconnect at the same interval
as the health check, which currently is 15 s.
Also, as two smaller changes to accompany this one:
- Increase the default request_timeout from 15 s to 45 s.
- Rename request_timeout to request_ttl.
2023-06-01 11:20:06 -03:00
firest
232ef23a48
feat(rocketmq): refactored bridge to avoid leaking resources during crashes at creation
2023-06-01 18:49:45 +08:00
Thales Macedo Garitezi
a7f4f81c38
Merge pull request #10887 from thalesmg/fix-async-worker-down-buffer-worker-20230530-v50
...
fix: block buffer workers so they may retry requests
2023-05-30 17:39:18 -03:00
Andrew Mayorov
a2688325e5
Merge pull request #10754 from fix/EMQX-10056/mqtt
...
feat(mqttconn): employ ecpool instead of single worker
2023-05-30 23:28:10 +03:00
Thales Macedo Garitezi
8c565abc84
test(cassandra): fix flaky test
2023-05-30 15:42:53 -03:00
Thales Macedo Garitezi
6be8ff378e
fix(buffer_worker): make buffer worker enter `blocked` state when async worker dies
...
Fixes https://emqx.atlassian.net/browse/EMQX-10074
Otherwise, requests from those async workers, now retriable, might not
be retried until the buffer worker blocks for other reasons, which
might take a long time.
2023-05-30 15:34:22 -03:00
Andrew Mayorov
a5fc26736d
refactor(mqttconn): split ingress/egress into 2 separate pools
...
Each with a more refined set of responsibilities, at the cost of slight
code duplication. Also provide two different config fields for each pool
size.
2023-05-30 17:21:44 +03:00
Thales Macedo Garitezi
75fcac9711
Merge pull request #10826 from thalesmg/test-partial-batch-expired-inflight-v50
...
test(buffer_worker): add assertion for inflight count after batch expiration
2023-05-30 09:05:59 -03:00
Zaiming (Stone) Shi
91cdc69976
Merge pull request #10867 from zmstone/0530-merge-release-50-to-master
...
0530 merge release 50 to master
2023-05-30 09:54:57 +02:00
Zaiming (Stone) Shi
9529919046
chore: bump app versions
2023-05-30 08:08:29 +02:00
Thales Macedo Garitezi
67e182e0c9
Merge pull request #10813 from thalesmg/refactor-kafka-on-stop-v50
...
feat(kafka): ensure allocated resources are removed on failures
2023-05-29 16:49:29 -03:00
Zaiming (Stone) Shi
36e268c933
chore: bump app versions
2023-05-26 16:05:37 +02:00
Zaiming (Stone) Shi
cc5b4d3748
Merge remote-tracking branch 'origin/release-50' into 0526-ci-delete-otp-24-from-standalone-app-test
2023-05-26 15:58:16 +02:00
Thales Macedo Garitezi
32e6213ce3
fix(resource_manager_sup): use `one_for_one` instead of `simple_one_for_one`
...
Using `simple_one_for_one` has a potential race condition issue where
we read the PID of the resource manager before trying to remove a
resource, and then that PID changes because it was either dead at
first, or it crashed and changed, and later we use this stale PID to
try to remove it from the supervisor. Under such circumstances, the
restarting child might linger in the supervisor, leaking resources.
By using the resource ID itself as a child ID (and using `one_for_one`
restart strategy), we ensure the child is truly removed.
2023-05-25 18:07:43 -03:00
Thales Macedo Garitezi
42b37690c7
refactor(pulsar): use macros for allocatable resources
2023-05-25 16:38:09 -03:00
Thales Macedo Garitezi
db60dcbada
test(buffer_worker): add assertion for inflight count after batch expiration
...
Fixes https://emqx.atlassian.net/browse/EMQX-9829
2023-05-25 16:11:37 -03:00
Thales Macedo Garitezi
18d57ba3eb
Merge pull request #10812 from thalesmg/test-flakiness-20230524
...
test: attempts to reduce flakiness (pgsql, cassandra)
2023-05-25 09:29:13 -03:00
JianBo He
de7f1c8aec
test: add tests for auto_restart_interval
2023-05-25 17:15:19 +08:00
JianBo He
71b636e321
fix: fix auto_restart_interval checker
2023-05-25 12:04:23 +08:00
Paulo Zulato
122ebcac24
fix: add user-friendly message when interval is out of range
2023-05-24 15:46:00 -03:00
Thales Macedo Garitezi
7f88521836
test(pgsql): reduce flakiness
...
Depending on timing, `t_write_timeout` was getting stuck while
checking the resource health, and the previous request timeout options
were making a response to never be sent if that process took too long.
2023-05-24 15:41:25 -03:00
Thales Macedo Garitezi
fd2940cd77
feat(pulsar): ensure allocated resources are removed on failures (v5.0)
...
Fixes https://emqx.atlassian.net/browse/EMQX-9937
2023-05-24 12:29:00 -03:00
Zaiming (Stone) Shi
732a7be187
Merge remote-tracking branch 'origin/release-50'
2023-05-22 17:46:54 +02:00
Thales Macedo Garitezi
0559d6f639
refactor(buffer_worker): use static fn for bumping counters
2023-05-22 09:12:08 -03:00
Thales Macedo Garitezi
c74c93388e
refactor: rename some variables and sum type constructors for clarity
2023-05-22 09:11:23 -03:00
Thales Macedo Garitezi
7d798c10e9
perf(buffer_worker): flush metrics periodically inside buffer worker process
...
Fixes https://emqx.atlassian.net/browse/EMQX-9905
Since calling `telemetry` is costly in a hot path, we instead collect
metrics inside the buffer workers state and periodically flush them,
rather than immediately as events happen.
2023-05-22 09:11:23 -03:00