Commit Graph

75 Commits

Author SHA1 Message Date
Thales Macedo Garitezi eacd803a37 test(pulsar): fix flaky test 2024-03-06 12:07:02 -03:00
zhongwencool 3814203fa2 test(pulsar): add pulsar action v2 testcase 2024-02-29 16:23:37 +08:00
zhongwencool e9e1daf962 chore: update some pulsar's desc 2024-02-29 16:23:37 +08:00
zhongwencool 650f9659a4 fix: create pulsar producer when on_add_channel 2024-02-29 16:23:37 +08:00
zhongwencool 7f1b4cef27 feat: pulsar bridge v2 2024-02-29 16:23:37 +08:00
Andrew Mayorov 28d664bae2
test(pulsar): simplify the test suite 2024-02-27 23:51:54 +01:00
Zaiming (Stone) Shi 46877e979b chore: update copyright-year 2024-02-23 08:21:06 +01:00
Zaiming (Stone) Shi 82403167c2 chore: update BSL license change date 2024-01-29 16:47:31 +01:00
Zaiming (Stone) Shi bfc02d1ccf test(pulsar): fix pulsar consumer ssl opts 2023-12-14 22:21:31 +01:00
Zaiming (Stone) Shi ddbb8560fa fix(dialyzer): batch 2 2023-12-08 17:59:55 +01:00
Zaiming (Stone) Shi 22f7cc1622 test: replace 'slave' and 'ct_slave' with 'peer' 2023-12-01 08:07:09 +01:00
Zaiming (Stone) Shi 14644988e0 chore: change triple-quotes to single-quotes 2023-11-29 16:15:18 +01:00
Thales Macedo Garitezi 6c9417efe0 Merge remote-tracking branch 'origin/release-53' into sync-r53-m-20231122 2023-11-22 12:02:34 -03:00
Thales Macedo Garitezi 11ec1a30a0 test(flaky): fix flaky pulsar test 2023-11-21 16:00:19 -03:00
Ivan Dyachkov 7c0e345d3a Merge remote-tracking branch 'upstream/release-54' 2023-11-14 19:38:21 +01:00
Andrew Mayorov 2449d54b1f
feat(pulsar): accept wrapped secrets as passwords 2023-11-14 16:05:52 +07:00
Zaiming (Stone) Shi 855b3c5b29 test: ensure atom exists 2023-11-10 13:41:51 +01:00
Kjell Winblad 9dc3a169b3 feat: split bridges into a connector part and a bridge part
Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>
Co-authored-by: Stefan Strigler <stefan.strigler@emqx.io>
Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>

Several bridges should be able to share a connector pool defined by a
single connector. The connectors should be possible to enable and
disable similar to how one can disable and enable bridges. There should
also be an API for checking the status of a connector and for
add/edit/delete connectors similar to the current bridge API.

Issues:
https://emqx.atlassian.net/browse/EMQX-10805
2023-10-30 14:48:47 +01:00
Thales Macedo Garitezi 902b1d6ec5 fix(pulsar_producer): use `simple_async_internal_buffer` query mode for Pulsar
Since it has internal buffering, it necessitates the same fix as Kafka producer.
2023-10-09 15:02:25 -03:00
Zaiming (Stone) Shi ea8d54fd8b test: ensure atom exists in test module 2023-09-27 12:58:06 +02:00
Zaiming (Stone) Shi 7c2f87fabe test: merge broker and router boot modules 2023-09-06 21:36:16 +02:00
Thales Macedo Garitezi 926eb4e3dd test: rm unused var warning 2023-08-14 10:33:24 -03:00
Thales Macedo Garitezi 29e706c83d refactor: move catch to dry run fn 2023-08-07 13:08:34 -03:00
Thales Macedo Garitezi 5c8dc092a1 fix(http_bridge): don't attempt to convert headers to atoms
Fixes https://emqx.atlassian.net/browse/EMQX-10653
2023-08-07 13:08:34 -03:00
Thales Macedo Garitezi 6cd503865b fix(machine_boot): ensure `emqx_bridge` starts after its companion apps
We need to reverse the dependency of `emqx_bridge` and `emqx_bridge_*`, because the former
loads and starts bridges during its application startup.  If the individual bridge
application being loaded has not started with its dependencies, the supervision tree will
not be ready for that.
2023-07-20 13:11:44 -03:00
Thales Macedo Garitezi b9b11d8f4d fix(machine_boot): use shared list of reboot apps and add bridges to reboot list 2023-07-19 20:15:42 -03:00
JimMoen 30c931ae62
fix: pulsar flaky cluster tests 2023-07-07 12:25:37 +08:00
JimMoen b089fba100
refactor: rm ee_bridge and ee_connector application 2023-07-07 12:25:37 +08:00
Thales Macedo Garitezi b609792a90 fix(pulsar_producer): do not return `disconnected` when checking status (r5.1)
Fixes https://emqx.atlassian.net/browse/EMQX-10278

Since Pulsar client has its own replayq that lives outside the management of the buffer
workers, we must not return disconnected status for such bridge: otherwise, the resource
manager will eventually kill the producers and data may be lost.
2023-06-13 11:44:45 -03:00
Thales Macedo Garitezi c11011d857
Merge pull request #11024 from thalesmg/pulsar-cosmetic-connecting-check-r51
feat(pulsar): retry health check a bit before returning (r5.1)
2023-06-12 13:00:54 -03:00
Thales Macedo Garitezi 741c1f091e fix(pulsar): update pulsar -> 0.8.3
Fixes https://emqx.atlassian.net/browse/EMQX-10229

See https://github.com/emqx/pulsar-client-erl/pull/59

Fixes a case_clause error that could arise from race conditions.
2023-06-12 10:29:40 -03:00
Thales Macedo Garitezi db5d14d5bf feat(pulsar): retry health check a bit before returning (r5.1)
Fixes https://emqx.atlassian.net/browse/EMQX-10228

This is a cosmetic fix for the Pulsar Producer bridge health check status.

Pulsar connection process is asynchronous, therefore, when a bridge of this type is
created or updated (which is the same as stopping and re-creating it), the immediate
status will be connecting because it’s indeed still connecting.  The bridge will connect
very soon afterwards (assuming there are no true network/config issue), but having to
refresh the UI to see the new status and/or seeing the resource alarm might annoy users.

This workaround adds a few retries to account for that effect to reduce the probability of
seeing the `connecting` state on such happy-paths.
2023-06-12 10:26:07 -03:00
Zaiming (Stone) Shi 97850de524 Merge remote-tracking branch 'origin/release-51' into 0610-merge-release-51-to-master 2023-06-10 12:23:55 +02:00
Andrew Mayorov b930f4cc73
Merge pull request #10987 from keynslug/fix/EMQX-9257/sep-placeholder
refactor: tear `emqx_plugin_libs` application apart
2023-06-09 17:02:14 +02:00
Andrew Mayorov a51baaa206
refactor(pluglib): move conversion utils to `emqx_utils_conv` 2023-06-09 14:44:37 +03:00
Andrew Mayorov d6c1ee183f
refactor(pluglib): move `emqx_placeholder` to utils app
Also make user that existing code calls it directly.
2023-06-09 14:44:36 +03:00
Kjell Winblad 1c7834e056 fix: fixes due to comments from @zmstone 2023-06-08 16:47:02 +02:00
Kjell Winblad ed9e29e769 refactor: refacor query_mode detection code
This commit refactor the query_mode resource detection code according to
a suggestion from @zmstone. This commit should not contain any
functional change except for a change of the Kafka producer bridge
config.
2023-06-08 16:26:55 +02:00
Thales Macedo Garitezi 97a9bb484a test(pulsar_producer): attempt to fix flaky test 2023-06-06 09:32:38 -03:00
Thales Macedo Garitezi 46393343e2 chore: use `timeout_duration` types for timer fields
Fixes https://emqx.atlassian.net/browse/EMQX-10020
2023-06-05 11:46:38 -03:00
Thales Macedo Garitezi 3e4790edd4 test(pulsar_producer): fix flaky test 2023-06-01 13:01:58 -03:00
Thales Macedo Garitezi 10425eb925 feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval`
See:
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options

Current problem:

In 5.0.x, we have two timer options that control the state changing of buffer worker
resources: auto_restart_interval and health_check_interval.

- auto_restart_interval controls how often the resource attempts to transition from
disconnected to connected.

- health_check_interval controls how often the resource is checked and potentially moved
from connected to disconnected or connecting.

The existence of two independent timers for very similar purposes is confusing to users,
QA and even developers.  Also, an intimately related configuration is request_timeout,
which can interact badly with auto_restart_interval if the latter is poorly configured:
requests may always expire if request_timeout < auto_restart_interval and if the resource
enters the disconnected state.  For health_check_interval, we attempt to derive a sane
default that gives requests a chance to retry (if request timeout is finite, then the
resource retries requests with a period of min(health_check_interval, request_timeout /
3).

Another problem with the separate auto_restart_interval is that its default value (60 s)
is too high when compared to the default request timeout and health check, leading to the
problems described above if not tuned.

Proposed solution:

We propose to drop auto_restart_interval in favor of health_check_interval, which will be
used for both disconnected -> connected and connected -> {disconnected, connecting}
transition checks.  With that, the resource will attempt to reconnect at the same interval
as the health check, which currently is 15 s.

Also, as two smaller changes to accompany this one:

- Increase the default request_timeout from 15 s to 45 s.
- Rename request_timeout to request_ttl.
2023-06-01 11:20:06 -03:00
Thales Macedo Garitezi 0d539e91d1 test(pulsar_producer): attempt to stabilize flaky test
https://github.com/emqx/emqx/actions/runs/5125166433/jobs/9218613872?pr=10886#step:7:679
```
  =CRITICAL REPORT==== 30-May-2023::19:38:58.003170 ===
  Run stage failed: error:{badmatch,
                              {timeout,
                                  [#{msg => pulsar_producer_bridge_started,
                                     '~meta' =>
                                         #{gl => <97891.472.0>,
                                           location =>
                                               #Fun<emqx_bridge_pulsar_impl_producer.11.109752493>,
                                           node => 'autocluster_node1@127.0.0.1',
                                           pid => <97891.787.0>,
                                           time => -576460611692219}}]}}
  Stacktrace: [{emqx_bridge_pulsar_impl_producer_SUITE,'-t_cluster/1-fun-10-',
                   6,
                   [{file,
                        "/emqx/apps/emqx_bridge_pulsar/test/emqx_bridge_pulsar_impl_producer_SUITE.erl"},
                    {line,1073}]},
               {emqx_bridge_pulsar_impl_producer_SUITE,t_cluster,1,
                   [{file,
                        "/emqx/apps/emqx_bridge_pulsar/test/emqx_bridge_pulsar_impl_producer_SUITE.erl"},
                    {line,1064}]}]
```
2023-05-31 10:19:55 -03:00
Thales Macedo Garitezi 9c3f838e14
Merge pull request #10841 from thalesmg/kafka-validate-key-v50
feat({kafka,pulsar}_producer): add validation for empty message key when strategy = key_dispatch
2023-05-30 09:37:15 -03:00
Zaiming (Stone) Shi 91cdc69976
Merge pull request #10867 from zmstone/0530-merge-release-50-to-master
0530 merge release 50 to master
2023-05-30 09:54:57 +02:00
Zaiming (Stone) Shi 9529919046 chore: bump app versions 2023-05-30 08:08:29 +02:00
Thales Macedo Garitezi 67e182e0c9
Merge pull request #10813 from thalesmg/refactor-kafka-on-stop-v50
feat(kafka): ensure allocated resources are removed on failures
2023-05-29 16:49:29 -03:00
Thales Macedo Garitezi 3edbad9f56 feat(pulsar_producer): add validation for empty message key when strategy = key_dispatch 2023-05-29 10:04:19 -03:00
Zaiming (Stone) Shi 36e268c933 chore: bump app versions 2023-05-26 16:05:37 +02:00
Thales Macedo Garitezi 32e6213ce3 fix(resource_manager_sup): use `one_for_one` instead of `simple_one_for_one`
Using `simple_one_for_one` has a potential race condition issue where
we read the PID of the resource manager before trying to remove a
resource, and then that PID changes because it was either dead at
first, or it crashed and changed, and later we use this stale PID to
try to remove it from the supervisor.  Under such circumstances, the
restarting child might linger in the supervisor, leaking resources.

By using the resource ID itself as a child ID (and using `one_for_one`
restart strategy), we ensure the child is truly removed.
2023-05-25 18:07:43 -03:00