Commit Graph

90 Commits

Author SHA1 Message Date
zhongwencool 54c542c795 chore: rename _probe_ to t_probe_ 2024-02-27 17:57:15 +08:00
zhongwencool c67f2130f2 fix: check connector and bridge_v2 with the right schema 2024-02-27 17:57:10 +08:00
Thales Macedo Garitezi 15f919e60f
Merge pull request #12564 from thalesmg/bw-support-batch-list-resp-m-20240221
feat(resource): allow `on_batch_query{,_async}` to return a list of individual results
2024-02-23 09:37:42 -03:00
Zaiming (Stone) Shi 46877e979b chore: update copyright-year 2024-02-23 08:21:06 +01:00
Thales Macedo Garitezi d003f77021 feat(resource): allow `on_batch_query{,_async}` to return a list of individual results
Fixes https://emqx.atlassian.net/browse/EMQX-11892

This allows callers of batching resources to receive results specific to their requests,
rather than a broad success or failure for the whole batch.
2024-02-22 16:26:02 -03:00
Thales Macedo Garitezi 9a32895a1a feat: convert `gcp_pubsub_consumer` to connector/source
Fixes https://emqx.atlassian.net/browse/EMQX-11471
2024-02-19 14:42:35 -03:00
Thales Macedo Garitezi 29ae45c39d fix(connector): don't start buffer workers for the connector itself
Fixes https://emqx.atlassian.net/browse/EMQX-11448
2023-12-01 17:20:38 -03:00
Thales Macedo Garitezi 9e1796ec4f feat(gcp_pubsub_producer): migrate GCP PubSub producer to actions
Fixes https://emqx.atlassian.net/browse/EMQX-11157
2023-11-21 14:22:42 -03:00
Thales Macedo Garitezi b92821188b fix(kafka_producer): make status `connecting` while the client fails to connect
Fixes https://emqx.atlassian.net/browse/EMQX-11408

To make it consistent with the previous bridge behavior.

Also, introduces macros for resource status to avoid problems with typos.
2023-11-16 14:50:23 -03:00
Thales Macedo Garitezi 7dcdbc9e51 fix(resource): take error from action/connector before attempting query
Fixes https://emqx.atlassian.net/browse/EMQX-11284
Fixes https://emqx.atlassian.net/browse/EMQX-11298
2023-11-07 10:04:04 -03:00
Kjell Winblad 9dc3a169b3 feat: split bridges into a connector part and a bridge part
Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>
Co-authored-by: Stefan Strigler <stefan.strigler@emqx.io>
Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>

Several bridges should be able to share a connector pool defined by a
single connector. The connectors should be possible to enable and
disable similar to how one can disable and enable bridges. There should
also be an API for checking the status of a connector and for
add/edit/delete connectors similar to the current bridge API.

Issues:
https://emqx.atlassian.net/browse/EMQX-10805
2023-10-30 14:48:47 +01:00
Thales Macedo Garitezi cf2075d7d8 chore: remove mention of `is_buffer_supported` from typespec 2023-10-10 09:49:18 -03:00
Thales Macedo Garitezi 902b1d6ec5 fix(pulsar_producer): use `simple_async_internal_buffer` query mode for Pulsar
Since it has internal buffering, it necessitates the same fix as Kafka producer.
2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi 79cf0a2ced fix(kafka_producer): correctly handle metrics for connector that have internal buffers
Fixes https://emqx.atlassian.net/browse/EMQX-11086

There’s currently a metric inconsistency due to the internal buffering nature of Kafka
Producer (wolff).

We use simple_sync_query to call the Kafka Producer bridge.  If that times out, the call
is accounted as failed, even though the message is buffered in wolff and later sent
successfully.
2023-10-09 15:02:25 -03:00
Thales Macedo Garitezi eb41b77de4 fix(rule_metrics): notify rule metrics of late replies and expired requests
Fixes https://emqx.atlassian.net/browse/EMQX-10600
2023-07-19 11:39:28 -03:00
zhongwencool b5cc8fb3c3 fix: start_after_created's default value 2023-07-07 16:39:26 +08:00
Stefan Strigler 321fd53132 fix: use ReplyTo in QUERY for async 2023-06-29 16:09:45 +02:00
Thales Macedo Garitezi 7f850f7499 fix(resource): fix `query_mode/0` type and usage 2023-06-19 15:59:00 -03:00
Kjell Winblad 2671e8ecf9 fix: dialyzer type problem 2023-06-09 11:00:05 +02:00
Thales Macedo Garitezi 99796224d8 refactor(resource): rename `request_timeout` -> `request_ttl`
See
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options
2023-06-01 13:01:53 -03:00
Thales Macedo Garitezi f42ccb6262 feat(resource): increase default request timeout to 45 s
See
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options
2023-06-01 11:20:06 -03:00
Thales Macedo Garitezi 10425eb925 feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval`
See:
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options

Current problem:

In 5.0.x, we have two timer options that control the state changing of buffer worker
resources: auto_restart_interval and health_check_interval.

- auto_restart_interval controls how often the resource attempts to transition from
disconnected to connected.

- health_check_interval controls how often the resource is checked and potentially moved
from connected to disconnected or connecting.

The existence of two independent timers for very similar purposes is confusing to users,
QA and even developers.  Also, an intimately related configuration is request_timeout,
which can interact badly with auto_restart_interval if the latter is poorly configured:
requests may always expire if request_timeout < auto_restart_interval and if the resource
enters the disconnected state.  For health_check_interval, we attempt to derive a sane
default that gives requests a chance to retry (if request timeout is finite, then the
resource retries requests with a period of min(health_check_interval, request_timeout /
3).

Another problem with the separate auto_restart_interval is that its default value (60 s)
is too high when compared to the default request timeout and health check, leading to the
problems described above if not tuned.

Proposed solution:

We propose to drop auto_restart_interval in favor of health_check_interval, which will be
used for both disconnected -> connected and connected -> {disconnected, connecting}
transition checks.  With that, the resource will attempt to reconnect at the same interval
as the health check, which currently is 15 s.

Also, as two smaller changes to accompany this one:

- Increase the default request_timeout from 15 s to 45 s.
- Rename request_timeout to request_ttl.
2023-06-01 11:20:06 -03:00
Zaiming (Stone) Shi cc5b4d3748 Merge remote-tracking branch 'origin/release-50' into 0526-ci-delete-otp-24-from-standalone-app-test 2023-05-26 15:58:16 +02:00
JianBo He 71b636e321 fix: fix auto_restart_interval checker 2023-05-25 12:04:23 +08:00
Thales Macedo Garitezi fd2940cd77 feat(pulsar): ensure allocated resources are removed on failures (v5.0)
Fixes https://emqx.atlassian.net/browse/EMQX-9937
2023-05-24 12:29:00 -03:00
Thales Macedo Garitezi 7d798c10e9 perf(buffer_worker): flush metrics periodically inside buffer worker process
Fixes https://emqx.atlassian.net/browse/EMQX-9905

Since calling `telemetry` is costly in a hot path, we instead collect
metrics inside the buffer workers state and periodically flush them,
rather than immediately as events happen.
2023-05-22 09:11:23 -03:00
Andrew Mayorov 4575167607
feat(resource): drop `manager_id()` type 2023-05-02 17:29:20 +03:00
Thales Macedo Garitezi e073bc90bc refactor(buffer_worker): rename `s/queue/buffer/g` 2023-04-14 11:37:19 -03:00
Thales Macedo Garitezi 14ed4a7ada feat(buffer_worker): set default queue mode to `memory_only`
Fixes https://emqx.atlassian.net/browse/EMQX-9367

For better user experience and performance for the average bridge, we
should change the default queue mode to `memory_only`, as was the
behavior of most bridges in e4.x.  This leads to better performance
when message rate is high enough and the remote resource is not
keeping up with EMQX.

Also, we set the default segment size to equal max queue bytes.
2023-04-14 11:37:19 -03:00
Thales Macedo Garitezi 4de13d2800 feat(buffer_worker): change default max queue bytes to 256 MB 2023-04-14 09:31:33 -03:00
Andrew Mayorov 9c9f39d0f7
feat(resman): also move out metrics collection for debugging
Now `emqx_resource:list_instances_verbose/0` will populate the metrics
for each instance, for the sake of simplicity.
2023-04-12 16:14:42 +03:00
Kjell Winblad 8e0d315b7b
Merge pull request #10197 from kjellwinblad/0321-fix-inflight-window-hand-over-to-kjell
fix: add inflight window setting to the clickhouse bridge
2023-03-29 09:38:24 +02:00
Kjell Winblad 35474578ca refactor: rename async_inflight_window to inflight_window everywhere 2023-03-23 14:21:57 +01:00
Stefan Strigler 53825b9aba fix(emqx_bridge): propagate connection error to resource status 2023-03-21 15:02:29 +01:00
Thales Macedo Garitezi d464e2aad5 refactor: rename test resource prefix
Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>
2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi 03342923b9 fix(bridge): use the same dry run prefix
Kafka Producer and Consumer bridges rely on this prefix for detecting
a dry run and avoid leaking atoms.  At some point, this prefix was
changed, effectively disabling the check in Kafka Producer.
2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi e9d3fc511f chore(buffer_worker): change default `batch_time` to 0 and improve docs 2023-03-06 15:31:28 -03:00
Zaiming (Stone) Shi fb61c2b266 perf: avoid getting metrics (gen_server:call) for each resource lookup 2023-02-10 19:40:37 +01:00
Zaiming (Stone) Shi c0d478bd41 fix(buffer_worker): type spec 2023-02-02 14:11:12 +01:00
Zaiming (Stone) Shi 5fdf7fd24c fix(kafka): use async callback to bump success counters
some telemetry events from wolff are discarded:

* dropped:
    this is double counted in wolff,
    we now only subscribe to the dropped_queue_full event
* retried_failed:
    it has different meanings in wolff,
    in wolff, it means it's the 2nd (or onward) produce attempt
    in EMQX, it means it's eventually failed after some retries

* retried_success
    since we are going to handle the success counters in callbac
    this having this reported from wolff will only make things
    harder to understand

* failed
    wolff never fails (unelss drop which is a different counter)
2023-01-24 21:12:36 +01:00
Zaiming (Stone) Shi 8fde169abb
Merge pull request #9821 from thalesmg/buffer-worker-expiry-v50
feat(buffer_worker): add expiration time to requests
2023-01-24 13:54:04 +01:00
Thales Macedo Garitezi 6fa6c679bb feat(buffer_worker): add expiration time to requests
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.

Also introduces two new counters:

- `dropped.expired` :: happens when a request expires before being
  sent downstream
- `late_reply` :: when a response is receive from downstream, but the
  caller is no longer for a reply because the request has expired, and
  the caller might even have retried it.
2023-01-20 11:36:52 -03:00
Ilya Averyanov 44a6e5ed15 chore(resources): add missing parameters to emqx_resource schema 2023-01-18 14:33:45 +02:00
Thales Macedo Garitezi fd360ac6c0 feat(buffer_worker): refactor buffer/resource workers to always use queue
This makes the buffer/resource workers always use `replayq` for
queuing, along with collecting multiple requests in a single call.
This is done to avoid long message queues for the buffer workers and
rely on `replayq`'s capabilities of offloading to disk and detecting
overflow.

Also, this deprecates the `enable_batch` and `enable_queue` resource
creation options, as: i) queuing is now always enables; ii) batch_size
> 1 <=> batch_enabled.  The corresponding metric
`dropped.queue_not_enabled` is dropped, along with `batching`.  The
batching is too ephemeral, especially considering a default batch time
of 20 ms, and is not shown in the dashboard, so it was removed.
2023-01-05 10:15:09 -03:00
Zaiming (Stone) Shi dbc10c2eed chore: update copyright year 2023 2023-01-02 09:22:27 +01:00
Zaiming (Stone) Shi 479e191dcf refactor: refine worker pool config and doc
worker pool is a buffer pool
the description hinted connection pool which is wrong.
2022-12-20 09:02:51 +01:00
Thales Macedo Garitezi 1cd91a24e9 feat(gcp_pubsub): implement GCP PubSub bridge (ee5.0) 2022-12-12 17:18:19 -03:00
Shawn f41adb0997 refactor: change some default values of resource_opts 2022-09-14 15:18:07 +08:00
Shawn 0ef0b68de4 refactor: change '{recoverable_error,Reason}' to '{error,{recoverable_error,Reason}}' 2022-08-31 18:25:00 +08:00
Shawn 9e50866cd0 fix: rename queue_max_bytes -> max_queue_bytes 2022-08-30 17:18:54 +08:00