Commit Graph

72 Commits

Author SHA1 Message Date
Kjell Winblad 2671e8ecf9 fix: dialyzer type problem 2023-06-09 11:00:05 +02:00
Thales Macedo Garitezi 99796224d8 refactor(resource): rename `request_timeout` -> `request_ttl`
See
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options
2023-06-01 13:01:53 -03:00
Thales Macedo Garitezi f42ccb6262 feat(resource): increase default request timeout to 45 s
See
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options
2023-06-01 11:20:06 -03:00
Thales Macedo Garitezi 10425eb925 feat(resource): deprecate `auto_restart_interval` in favor of `health_check_interval`
See:
https://emqx.atlassian.net/wiki/spaces/P/pages/612368639/open+e5.1+remove+auto+restart+interval+from+buffer+worker+resource+options

Current problem:

In 5.0.x, we have two timer options that control the state changing of buffer worker
resources: auto_restart_interval and health_check_interval.

- auto_restart_interval controls how often the resource attempts to transition from
disconnected to connected.

- health_check_interval controls how often the resource is checked and potentially moved
from connected to disconnected or connecting.

The existence of two independent timers for very similar purposes is confusing to users,
QA and even developers.  Also, an intimately related configuration is request_timeout,
which can interact badly with auto_restart_interval if the latter is poorly configured:
requests may always expire if request_timeout < auto_restart_interval and if the resource
enters the disconnected state.  For health_check_interval, we attempt to derive a sane
default that gives requests a chance to retry (if request timeout is finite, then the
resource retries requests with a period of min(health_check_interval, request_timeout /
3).

Another problem with the separate auto_restart_interval is that its default value (60 s)
is too high when compared to the default request timeout and health check, leading to the
problems described above if not tuned.

Proposed solution:

We propose to drop auto_restart_interval in favor of health_check_interval, which will be
used for both disconnected -> connected and connected -> {disconnected, connecting}
transition checks.  With that, the resource will attempt to reconnect at the same interval
as the health check, which currently is 15 s.

Also, as two smaller changes to accompany this one:

- Increase the default request_timeout from 15 s to 45 s.
- Rename request_timeout to request_ttl.
2023-06-01 11:20:06 -03:00
Zaiming (Stone) Shi cc5b4d3748 Merge remote-tracking branch 'origin/release-50' into 0526-ci-delete-otp-24-from-standalone-app-test 2023-05-26 15:58:16 +02:00
JianBo He 71b636e321 fix: fix auto_restart_interval checker 2023-05-25 12:04:23 +08:00
Thales Macedo Garitezi fd2940cd77 feat(pulsar): ensure allocated resources are removed on failures (v5.0)
Fixes https://emqx.atlassian.net/browse/EMQX-9937
2023-05-24 12:29:00 -03:00
Thales Macedo Garitezi 7d798c10e9 perf(buffer_worker): flush metrics periodically inside buffer worker process
Fixes https://emqx.atlassian.net/browse/EMQX-9905

Since calling `telemetry` is costly in a hot path, we instead collect
metrics inside the buffer workers state and periodically flush them,
rather than immediately as events happen.
2023-05-22 09:11:23 -03:00
Andrew Mayorov 4575167607
feat(resource): drop `manager_id()` type 2023-05-02 17:29:20 +03:00
Thales Macedo Garitezi e073bc90bc refactor(buffer_worker): rename `s/queue/buffer/g` 2023-04-14 11:37:19 -03:00
Thales Macedo Garitezi 14ed4a7ada feat(buffer_worker): set default queue mode to `memory_only`
Fixes https://emqx.atlassian.net/browse/EMQX-9367

For better user experience and performance for the average bridge, we
should change the default queue mode to `memory_only`, as was the
behavior of most bridges in e4.x.  This leads to better performance
when message rate is high enough and the remote resource is not
keeping up with EMQX.

Also, we set the default segment size to equal max queue bytes.
2023-04-14 11:37:19 -03:00
Thales Macedo Garitezi 4de13d2800 feat(buffer_worker): change default max queue bytes to 256 MB 2023-04-14 09:31:33 -03:00
Andrew Mayorov 9c9f39d0f7
feat(resman): also move out metrics collection for debugging
Now `emqx_resource:list_instances_verbose/0` will populate the metrics
for each instance, for the sake of simplicity.
2023-04-12 16:14:42 +03:00
Kjell Winblad 8e0d315b7b
Merge pull request #10197 from kjellwinblad/0321-fix-inflight-window-hand-over-to-kjell
fix: add inflight window setting to the clickhouse bridge
2023-03-29 09:38:24 +02:00
Kjell Winblad 35474578ca refactor: rename async_inflight_window to inflight_window everywhere 2023-03-23 14:21:57 +01:00
Stefan Strigler 53825b9aba fix(emqx_bridge): propagate connection error to resource status 2023-03-21 15:02:29 +01:00
Thales Macedo Garitezi d464e2aad5 refactor: rename test resource prefix
Co-authored-by: Zaiming (Stone) Shi <zmstone@gmail.com>
2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi 03342923b9 fix(bridge): use the same dry run prefix
Kafka Producer and Consumer bridges rely on this prefix for detecting
a dry run and avoid leaking atoms.  At some point, this prefix was
changed, effectively disabling the check in Kafka Producer.
2023-03-16 13:43:01 -03:00
Thales Macedo Garitezi e9d3fc511f chore(buffer_worker): change default `batch_time` to 0 and improve docs 2023-03-06 15:31:28 -03:00
Zaiming (Stone) Shi fb61c2b266 perf: avoid getting metrics (gen_server:call) for each resource lookup 2023-02-10 19:40:37 +01:00
Zaiming (Stone) Shi c0d478bd41 fix(buffer_worker): type spec 2023-02-02 14:11:12 +01:00
Zaiming (Stone) Shi 5fdf7fd24c fix(kafka): use async callback to bump success counters
some telemetry events from wolff are discarded:

* dropped:
    this is double counted in wolff,
    we now only subscribe to the dropped_queue_full event
* retried_failed:
    it has different meanings in wolff,
    in wolff, it means it's the 2nd (or onward) produce attempt
    in EMQX, it means it's eventually failed after some retries

* retried_success
    since we are going to handle the success counters in callbac
    this having this reported from wolff will only make things
    harder to understand

* failed
    wolff never fails (unelss drop which is a different counter)
2023-01-24 21:12:36 +01:00
Zaiming (Stone) Shi 8fde169abb
Merge pull request #9821 from thalesmg/buffer-worker-expiry-v50
feat(buffer_worker): add expiration time to requests
2023-01-24 13:54:04 +01:00
Thales Macedo Garitezi 6fa6c679bb feat(buffer_worker): add expiration time to requests
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.

Also introduces two new counters:

- `dropped.expired` :: happens when a request expires before being
  sent downstream
- `late_reply` :: when a response is receive from downstream, but the
  caller is no longer for a reply because the request has expired, and
  the caller might even have retried it.
2023-01-20 11:36:52 -03:00
Ilya Averyanov 44a6e5ed15 chore(resources): add missing parameters to emqx_resource schema 2023-01-18 14:33:45 +02:00
Thales Macedo Garitezi fd360ac6c0 feat(buffer_worker): refactor buffer/resource workers to always use queue
This makes the buffer/resource workers always use `replayq` for
queuing, along with collecting multiple requests in a single call.
This is done to avoid long message queues for the buffer workers and
rely on `replayq`'s capabilities of offloading to disk and detecting
overflow.

Also, this deprecates the `enable_batch` and `enable_queue` resource
creation options, as: i) queuing is now always enables; ii) batch_size
> 1 <=> batch_enabled.  The corresponding metric
`dropped.queue_not_enabled` is dropped, along with `batching`.  The
batching is too ephemeral, especially considering a default batch time
of 20 ms, and is not shown in the dashboard, so it was removed.
2023-01-05 10:15:09 -03:00
Zaiming (Stone) Shi dbc10c2eed chore: update copyright year 2023 2023-01-02 09:22:27 +01:00
Zaiming (Stone) Shi 479e191dcf refactor: refine worker pool config and doc
worker pool is a buffer pool
the description hinted connection pool which is wrong.
2022-12-20 09:02:51 +01:00
Thales Macedo Garitezi 1cd91a24e9 feat(gcp_pubsub): implement GCP PubSub bridge (ee5.0) 2022-12-12 17:18:19 -03:00
Shawn f41adb0997 refactor: change some default values of resource_opts 2022-09-14 15:18:07 +08:00
Shawn 0ef0b68de4 refactor: change '{recoverable_error,Reason}' to '{error,{recoverable_error,Reason}}' 2022-08-31 18:25:00 +08:00
Shawn 9e50866cd0 fix: rename queue_max_bytes -> max_queue_bytes 2022-08-30 17:18:54 +08:00
Shawn 6b0ccfbc43 refactor: rename the error return resource_down -> recoverable_error 2022-08-26 17:11:12 +08:00
Shawn 86577365e4 fix: use gen_statem:cast/3 for async query 2022-08-23 22:41:45 +08:00
JimMoen f0c2b53868 fix(bpapi): make bpapi static_checks happy 2022-08-22 10:51:44 +08:00
JimMoen 7c4ea38c06 fix(resource): make some resource opts internal
Resource options `start_after_created` and `start_timeout` are internal opts.
Not provided to users anymore.
2022-08-22 02:22:57 +08:00
JimMoen 06363e63d9 fix(influxdb): connector use a fallbacke `pool_size` for influxdb client 2022-08-19 15:54:19 +08:00
Shawn 9e35032d78 fix: make resume_interval defaults to health_check_interval 2022-08-16 10:09:02 +08:00
Xinyu Liu 2898966439
Merge branch 'dev/ee5.0' into resource_opts 2022-08-15 21:43:22 +08:00
Shawn 19d85d485b refactor(resource): add resource_opts level into config structure 2022-08-15 21:40:10 +08:00
JimMoen 3678673124 fix: schema default value using raw type before convert 2022-08-12 16:38:46 +08:00
Shawn 0cdf4b47f1 feat: add more resource creation opts 2022-08-12 13:47:45 +08:00
JimMoen 3a76a50382 fix: syntax error and compile error 2022-08-11 20:58:43 +08:00
Shawn 2872f0b668 fix(bridges): support create resources with options 2022-08-11 19:11:44 +08:00
JimMoen 22a4ca311c feat(resource): resource batch/async/queue config schema 2022-08-11 16:59:18 +08:00
Shawn 6203a01320 feat: add inflight window to emqx_resource 2022-08-11 08:36:35 +08:00
Shawn 82550a585a fix: add test cases for query async 2022-08-10 00:45:34 +08:00
Shawn 145ff66a9a fix: issues found by dialyzer and elvis 2022-08-10 00:45:26 +08:00
Shawn 35fe70b887 feat: support aysnc callback to connector modules 2022-08-10 00:34:35 +08:00
Shawn 2fb42e4d37 refactor: create emqx_resource_worker_sup for resource workers 2022-08-10 00:34:35 +08:00