Commit Graph

217 Commits

Author SHA1 Message Date
Thales Macedo Garitezi 4c24b08244 fix(rule_action): fix metrics for bridges returning `async_return`
Kafka Producer, when called asynchronously, will return
`{async_return, {ok, pid()}}`, which currently counts as an unknown failure.
2023-04-06 16:00:01 -03:00
Thales Macedo Garitezi 5d5b7ea215
Merge pull request #10306 from thalesmg/enable-async-buffer-workers-all-bridges-v50
feat(bridges): enable async query mode for all bridges with buffer workers
2023-04-04 17:10:46 -03:00
Thales Macedo Garitezi 0b6fd7fe14 fix(buffer_worker): check request timeout and health check interval
Port of https://github.com/emqx/emqx/pull/10154 for `release-50`

Fixes https://emqx.atlassian.net/browse/EMQX-9099

Originally, the `resume_interval`, which is what defines how often a
buffer worker will attempt to retry its inflight window, was set to
the same as the `health_check_interval`.  This had the problem that,
with default values, `health_check_interval = request_timeout`.  This
meant that, if a buffer worker with those configs were ever blocked,
all requests would have timed out by the time it retried them.

Here we change the default `resume_interval` to a reasonable value
dependent on `health_check_interval` and `request_timeout`, and also
expose that as a hidden parameter for fine tuning if necessary.
2023-04-04 08:58:36 -03:00
Thales Macedo Garitezi a8f8228a12
Merge pull request #10308 from thalesmg/test-increase-peer-timeout-v50
test(peer): increase init and startup timeout for peer nodes
2023-04-03 15:46:13 -03:00
Thales Macedo Garitezi f3ffc02bff feat(bridges): enable async query mode for all bridges with buffer workers
Fixes https://emqx.atlassian.net/browse/EMQX-9130

Since buffer workers always support async calls ("outer calls"), we
should decouple those two call modes (inner and outer), and avoid
exposing the inner call configuration to user to avoid complexity.

For bridges that currently only allow sync query modes, we should
allow them to be configured with async.  That means basically all
bridge types except Kafka Producer.
2023-04-03 14:49:51 -03:00
Thales Macedo Garitezi 8b5a717a1f test(peer): increase init and startup timeout for peer nodes
Attempt to stabilize tests that use cluster nodes.
2023-04-03 13:20:22 -03:00
Thales Macedo Garitezi ec1871ffde test(janitor): catch each callback invocation 2023-04-03 10:20:19 -03:00
zhongwencool d63680cf25
Merge pull request #10307 from emqx/release-50
Sync release-50 back to master
2023-04-02 11:36:41 +08:00
Kjell Winblad 58898ea11d
Merge pull request #10294 from kjellwinblad/kjell/feat/collection_var_syntax_mongodb/EMQX-9246
feat: (MongoDB bridge) use ${var} syntax in MongoDB collection field
2023-03-31 17:18:27 +02:00
Thales Macedo Garitezi 5011486b18 fix(kafka_consumer): return better error messages when probing kafka consumer bridge
Fixes https://emqx.atlassian.net/browse/EMQX-9422
2023-03-31 11:33:15 -03:00
Kjell Winblad e808fef1e4 feat: (MongoDB bridge) use ${var} syntax for MongoDB collection
This commit makes it possible to use the ${var} syntax to refer to
variables in the payload of the message in the collection field.
This makes it possible to select which collection to insert into
dynamically.

Fixes:
https://emqx.atlassian.net/browse/EMQX-9246
2023-03-30 17:49:56 +02:00
Thales Macedo Garitezi 632bffd451 fix: return friendly message when kafka producer fails to start (rv5.0)
Fixes https://emqx.atlassian.net/browse/EMQX-9392

The returned information does not allow to diagnose the issue (i.e.: a
connection issue due to the wrong host and port, the wrong password
failing authn).  However, such information is printed to the logs.

This changes the returned error to the API so that the user is hinted
at looking at the logs for further investigation of the error.
2023-03-30 11:51:36 -03:00
Zaiming (Stone) Shi 81a104690d test: fix flaky influxdb test 2023-03-30 16:19:22 +02:00
Zaiming (Stone) Shi 80eb9d7542
Merge pull request #10252 from emqx/release-50
0327 merge release-50 to master
2023-03-29 12:33:17 +02:00
Thales Macedo Garitezi 64faccf50b test: fix flaky kafka consumer test 2023-03-28 14:50:55 -03:00
Thales Macedo Garitezi 1a7ca7235e
Merge pull request #10249 from thalesmg/fix-kafka-offset-doc-rv50
feat(kafka_consumer): tie `offset_reset_policy` and `begin_offset` together (rv5.0)
2023-03-28 11:37:46 -03:00
Thales Macedo Garitezi 69fc1123ee refactor: change enum constructors and improve docs 2023-03-27 17:30:17 -03:00
Thales Macedo Garitezi 5cf09209cd feat: tie `offset_reset_policy` and `begin_offset` together
To make the configuration more intuitive and avoid exposing more
parameters to the user, we should:

1) Remove reset_by_subscriber as an enum constructor for
`offset_reset_policy`, as that might make the consumer hang
indefinitely without manual action.

2) Set the `begin_offset` `brod_consumer` parameter to `earliest` or
`latest` depending on the value of `offset_reset_policy`, as that’s
probably the user’s intention.
2023-03-27 14:20:31 -03:00
Zaiming (Stone) Shi da5e6f3d0a test: test with only one Kafka partition for bad config recover test 2023-03-27 17:38:34 +02:00
JianBo He bfa5922209
Merge pull request #10140 from HJianBo/cassa
feat: support cassandra data bridge
2023-03-27 10:23:02 +08:00
Thales Macedo Garitezi ff272a2071
Merge pull request #10206 from thalesmg/decouple-buffer-worker-query-call-mode-v50
feat(buffer_worker): decouple query mode from underlying connector call mode
2023-03-24 13:49:00 -03:00
Thales Macedo Garitezi f8d5d53908 feat(buffer_worker): decouple query mode from underlying connector call mode
Fixes https://emqx.atlassian.net/browse/EMQX-9129

Currently, if an user configures a bridge with query mode sync, then
all calls to the underlying driver/connector ("inner calls") will
always be synchronous, regardless of its support for async calls.

Since buffer workers always support async queries ("outer calls"), we
should decouple those two call modes (inner and outer), and avoid
exposing the inner call configuration to user to avoid complexity.

There are two situations when we want to force synchronous calls to
the underlying connector even if it supports async:

1) When using `simple_sync_query`, since we are bypassing the buffer
workers;
2) When retrying the inflight window, to avoid overwhelming the
driver.
2023-03-23 13:40:31 -03:00
JianBo He 9b63bdc1e0 chore: apply review suggestions
- Rename sql to cql
- Add tests for `bridges_probe` API
2023-03-23 15:27:34 +08:00
JianBo He 8cbbc9f271 Merge remote-tracking branch 'upstream/master' into cassa 2023-03-23 11:53:17 +08:00
Thales Macedo Garitezi ddffba0355
Merge pull request #10154 from thalesmg/fix-buffer-worker-default-req-timeout
fix(buffer_worker): calculate default `resume_interval` based on `request_timeout` and `health_check_interval`
2023-03-22 20:21:04 -03:00
Thales Macedo Garitezi 1ca6a51425
Merge pull request #10198 from thalesmg/fix-flaky-kconsumer-test-v50
test: attempt to fix flaky kafka consumer test
2023-03-22 17:19:50 -03:00
Thales Macedo Garitezi 127a075b66 test(dynamo): attempt to fix dynamo tests
Those tests in the `flaky` test are really flaky and require lots of
CI retries.

Apparently, the flakiness comes from race conditions from restarting
bridges with the same name too fast between test cases.  Previously,
all test cases were sharing the same bridge name (the module name).
2023-03-22 14:34:37 -03:00
Thales Macedo Garitezi 61cb03b45a fix(buffer_worker): change the default `resume_interval` value and expose it as hidden config
Also removes the previously added alarm for request timeout.

There are situations where having a short request timeout and a long
health check interval make sense, so we don't want to alarm the user
for those situations.  Instead, we automatically attempt to set a
reasonable `resume_interval` value.
2023-03-22 11:47:36 -03:00
lafirest 84def357a9
Merge pull request #10143 from lafirest/feat/rocketmq
feat(bridges): integrate RocketMQ into data bridges
2023-03-22 20:43:22 +08:00
firest 4ad3579966 test(bridges): add test suite for RocketMQ 2023-03-22 10:36:58 +08:00
JianBo He 65c2da7ef5 Merge remote-tracking branch 'ce/master' into cassa 2023-03-22 09:30:50 +08:00
Zaiming (Stone) Shi e6091db351 Merge remote-tracking branch 'origin/release-50' into 0321-merge-release-50-to-master 2023-03-21 22:03:31 +01:00
Thales Macedo Garitezi 7e6f52e8fe test: attempt to fix flaky kafka consumer test
It might need some time for the metrics to be set.
2023-03-21 17:45:58 -03:00
JianBo He 539ec2f774 chore(bridge): cover username/password auth for cassandra bridges 2023-03-21 13:55:53 +08:00
Serge Tupchii 3a46681dde feat: handle escaped characters in InfluxDB data bridge write_syntax
Closes: EMQX-7834
2023-03-20 16:42:23 +02:00
JianBo He d1689f6957 chore: correct api examples
follow https://github.com/emqx/emqx/pull/10114
2023-03-20 17:03:05 +08:00
Erik Timan 2d75c7d6d9 fix(emqx_bridge): remove metrics from non-dedicated bridge API endpoints
Metrics should only be exposed via the /bridges/:id/metrics endpoint,
and not in other operations such as getting the list of all bridges, or
in the response when a bridge has been created. This commit removes all
traces of metrics for the non-dedicated API endpoints.
2023-03-20 09:43:11 +01:00
JianBo He 12942b676d Merge remote-tracking branch 'upstream/master' into cassa 2023-03-20 09:50:27 +08:00
JianBo He 678cc937c0 test(bridge): cover ssl testing for cassandra bridge 2023-03-17 18:25:05 +08:00
JianBo He 5f0828a2ea ci: add certs for cassandra tls 2023-03-17 16:39:10 +08:00
JianBo He c0a216a740 feat(bridge): support cassandra bridge 2023-03-17 11:34:48 +08:00
Thales Macedo Garitezi 966276127e test: trying to make tests more stable 2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi 41b8d47696 test(kafka_consumer): add more clusterized tests 2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi fc5dfa108a fix(kafka_consumer): rename `force_utf8` to `none`
We actually enforce the key/value to be a valid UTF-8 string when
using `emqx_json:encode`, if we do encode using that, which is
template-dependent.
2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi 53979b6261 feat(kafka_consumer): support multiple topic mappings, payload
templates and key/value encodings

Added after feedback from the product team.
2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi f31f15e44e chore(kafka_producer): make schema changes more backwards compatible
This will still require fixes to the frontend.
2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi c182a4053e fix(kafka_consumer): avoid leaking atoms in bridge probe API 2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi e1fdd041b3 feat(kafka_consumer): add `offset_commit_interval_seconds` kafka parameter 2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi 1d5fe14a30 test: remove sleeps 2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi 969fa03ecc feat: implement kafka consumer 2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi 5ab5236ad3 test: fix flaky test 2023-03-16 13:42:38 -03:00
Andrew Mayorov f7c0d29478
test(tde): add testcase for a nasty string in SQL query
Similar to what we have in mysql and pgqsl testsuites.
2023-03-10 18:43:30 +03:00
Andrew Mayorov 0a7f6c7d03
fix(mysql): ensure proper escaping in batch inserts
Also hexencode non-utf8 binaries. This is essentially an heuristic.
We don't know column types in runtime, and there's no simple way
to find them out. Since we're already doing full binary scan during
escaping it should be cheap to bail out on non-utf8 strings and
hexencode them instead.

Also introduce separate function to highlight that this escaping
is MySQL-specific.
2023-03-10 18:42:04 +03:00
Andrew Mayorov fc37d9b3cd
fix(mysql): be explicit that batch queries are parameterless
So that mysql client won't attempt to prepare them automatically, thus
trashing the server's prepared statements table and making interaction
overall heavier.
2023-03-10 18:42:04 +03:00
Zaiming (Stone) Shi fe27604010 Merge remote-tracking branch 'origin/release-50' into 0308-merge-release-50-back-to-master 2023-03-08 16:46:45 +01:00
Serge Tupchii 97e71c54d4 fix: use default template if timestamp is empty (undefined) in InfluxDB bridge
Closes EMQX-8926
2023-03-08 11:58:23 +01:00
firest 984dd3446d test(bridges): add test suite for DynamoDB 2023-03-08 11:13:51 +08:00
Thales Macedo Garitezi e9d3fc511f chore(buffer_worker): change default `batch_time` to 0 and improve docs 2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi 9b087a21f5 fix(gcp_pubsub): remove conflicting `request_timeout` option
Since the buffer worker schema already contains that configuration,
having it two places can lead to quite confusing behavior.
2023-03-06 10:12:38 -03:00
Kjell Winblad 67acdf0888 feat: add clickhouse database bridge
This commit adds a Clickhouse bridge to EMQX 5. The bridge is similar to
the Clickhouse bridge in the 4.4, but adds the possibility to use
different formats (such as JSON) for values to be inserted.
2023-03-02 12:22:11 +01:00
Zaiming (Stone) Shi 083330ad80 Merge remote-tracking branch 'origin/master' into 0301-merge-release-50-to-master 2023-03-01 08:53:03 +01:00
Zaiming (Stone) Shi 24f476e35f test: add README to influxdb test script 2023-02-28 19:38:43 +01:00
Ivan Dyachkov 6ce5029d79
Merge pull request #9881 from olcai/log-influxdb-is-alive-reason
fix(emqx_ee_connector): log reason for failure when starting influxdb connector
2023-02-28 09:49:08 +00:00
Andrew Mayorov 9cbe64a132
fix(test): make strings json-friendly in kafka testsuite 2023-02-24 15:05:20 +03:00
Erik Timan da42c91fb2 test(emqx_ee_bridge): check influxdb:is_alive/2 return 2023-02-24 09:03:34 +01:00
Ivan Dyachkov c869eff6e8 test(kafka): disable overload protection in 2 more places 2023-02-20 21:38:14 +01:00
Ivan Dyachkov 9d30a52910 test: disable replayq memory overload protection 2023-02-20 19:29:26 +01:00
firest 0420e9acb5 test(bridges): add test cases for TDEngine 2023-02-14 22:04:29 +08:00
Stefan Strigler c7535fce56 test: fix test expecting 200 instead of 204 2023-02-08 09:14:35 +01:00
Zhongwen Deng 1c9035d24c test: remove async from redis ct 2023-02-02 17:37:18 +08:00
Ilya Averyanov 9d91ebe266
Merge pull request #9842 from savonarola/fix-redis-cluster-recover
fix: fix redis cluster resource recovery
2023-02-01 10:38:52 +02:00
Ilya Averyanov fce1e74c3d fix(connector): fix redis cluster resource recovery 2023-01-31 16:55:05 +02:00
Zaiming (Stone) Shi d47941601d refactor(buffer_worker): rename trace points 2023-01-28 11:52:11 +01:00
Zaiming (Stone) Shi 52b75ada04
Merge pull request #9832 from sstrigler/EMQX-8774-failure-to-handle-timeout-error-in-resource-worker
EMQX 8774 failure to handle timeout error in resource worker
2023-01-27 14:36:44 +01:00
Andrew Mayorov d35e46b2d5
Merge pull request #9838 from keynslug/fix/redis-cluster-batching
feat(redis): disable batching in redis_cluster bridges
2023-01-27 15:27:57 +04:00
Stefan Strigler 7d18128ba9 test: async write can return noproc 2023-01-27 11:43:51 +01:00
Stefan Strigler 2d62de5188 test: fix expected result from timeout error 2023-01-27 11:43:48 +01:00
Zaiming (Stone) Shi f6b3b930b0 chore: improve a error log 2023-01-26 14:21:27 +01:00
Andrew Mayorov 26fcaecad7
fix(redis): disable batching in `redis_cluster` bridges
Through configuration subsystem.
2023-01-25 17:28:11 +03:00
Andrew Mayorov 903a77b471
test(redis): ensure batch query hit different cluster shards
This will inevitably fail: it's not generally possible to update
different keys through the same cluster connection, one or more
update will fail with `MOVED` status. This testcase should serve
as a regression test later.
2023-01-25 15:33:05 +03:00
Zaiming (Stone) Shi 5fdf7fd24c fix(kafka): use async callback to bump success counters
some telemetry events from wolff are discarded:

* dropped:
    this is double counted in wolff,
    we now only subscribe to the dropped_queue_full event
* retried_failed:
    it has different meanings in wolff,
    in wolff, it means it's the 2nd (or onward) produce attempt
    in EMQX, it means it's eventually failed after some retries

* retried_success
    since we are going to handle the success counters in callbac
    this having this reported from wolff will only make things
    harder to understand

* failed
    wolff never fails (unelss drop which is a different counter)
2023-01-24 21:12:36 +01:00
Zaiming (Stone) Shi 8fde169abb
Merge pull request #9821 from thalesmg/buffer-worker-expiry-v50
feat(buffer_worker): add expiration time to requests
2023-01-24 13:54:04 +01:00
Thales Macedo Garitezi ca4a262b75 refactor: re-organize dealing with unrecoverable errors 2023-01-20 12:00:17 -03:00
Thales Macedo Garitezi 6fa6c679bb feat(buffer_worker): add expiration time to requests
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.

Also introduces two new counters:

- `dropped.expired` :: happens when a request expires before being
  sent downstream
- `late_reply` :: when a response is receive from downstream, but the
  caller is no longer for a reply because the request has expired, and
  the caller might even have retried it.
2023-01-20 11:36:52 -03:00
Thales Macedo Garitezi 47f796dd12 refactor: rename `emqx_resource_worker` -> `emqx_resource_buffer_worker`
To make it more clear that it's purpose is serve as a buffering layer.
2023-01-18 16:15:34 -03:00
Ilya Averyanov f9843de7ae
Merge pull request #9628 from savonarola/fix-flaky-redis-bridge-test
chore(ee bridge): fix Redis bridge test flakyness
2023-01-18 20:56:13 +02:00
Zaiming (Stone) Shi 2d01e604a5
Merge pull request #9799 from zmstone/0118-fix-key_dispatch-kafka-produce-strategy
0118 fix key dispatch kafka produce strategy
2023-01-18 13:52:49 +01:00
Ilya Averyanov f6fbbf3ee3 chore(bridges): reduce Redis bridge flakyness 2023-01-18 14:34:11 +02:00
Zaiming (Stone) Shi 8f275a66d0 test: add coverage for key_dispatch partition strategy 2023-01-18 11:47:37 +01:00
Zaiming (Stone) Shi d4f3b4c8c2 Merge remote-tracking branch 'origin/master' into fix-buffer-clear-replayq-on-delete-v50 2023-01-18 11:39:47 +01:00
Thales Macedo Garitezi 087b667263 fix(buffer_worker): allow signalling unrecoverable errors 2023-01-17 19:50:30 -03:00
Thales Macedo Garitezi fa01deb3eb chore: retry as much as possible, don't reply to caller too soon 2023-01-17 16:49:15 -03:00
Thales Macedo Garitezi 32a9e60313 feat(buffer_worker): also use the inflight table for sync requests
Related: https://emqx.atlassian.net/browse/EMQX-8692

This should also correctly account for `retried.*` metrics for sync
requests.

Also fixes cases where race conditions for retrying async requests
could potentially lead to inconsistent metrics.

Fixes more cases where a stale reference to `replayq` was being held
accidentally after a `pop`.
2023-01-17 16:48:48 -03:00
Stefan Strigler f37b3e4bc4 test: test against `bridges_probe` API 2023-01-17 15:29:19 +01:00
Zaiming (Stone) Shi a7fc5e8fe1
Merge pull request #9761 from zmstone/0114-fix-kafka-value-template-and-docs
feat: introduce 'this' concept for placeholder, and use it in Kafka bridge
2023-01-16 13:37:29 +01:00
Zaiming (Stone) Shi 47414e0d53 docs: improve kafka key and value field description 2023-01-16 11:32:09 +01:00
Zaiming (Stone) Shi 91c5a89985 test: wait for redis connected state
the case is sometimes flaky because the health check sometimes
return connecting
2023-01-14 18:33:55 +01:00
Stefan Strigler e08c1d2229 Merge remote-tracking branch 'olcai/refactor-bridges-api' into dev/api-refactor 2023-01-13 15:49:52 +01:00
Erik Timan 7a17fb7308 test(emqx_ee_bridge): fix bridge enable/disable in kafka producer suite 2023-01-13 14:40:54 +01:00
Thales Macedo Garitezi f25bd288ad
Merge pull request #9742 from thalesmg/expose-resource-opts-mongo-v50
feat(mongo): expose buffer worker opts to the bridge frontend (5.0)
2023-01-13 10:23:49 -03:00