Commit Graph

31 Commits

Author SHA1 Message Date
Thales Macedo Garitezi 3013189cd7 fix(resource manager): force kill process if stuck when stopping/removing
Fixes https://emqx.atlassian.net/browse/EMQX-12357
2024-06-04 11:38:24 -03:00
Thales Macedo Garitezi 60cad74286 feat(resource): non-blocking channel health checks
Fixes https://emqx.atlassian.net/browse/EMQX-12015

Continuation of https://github.com/emqx/emqx/pull/12812
2024-04-04 16:13:30 -03:00
Thales Macedo Garitezi bade09b56e feat(resource manager): perform non-blocking resource health checks
Fixes https://emqx.atlassian.net/browse/EMQX-12015

This introduces only _resource_ non-blocking health checks.  _Channel_ non-blocking health
checks may be introduced later.
2024-04-01 14:46:15 -03:00
Thales Macedo Garitezi 15f919e60f
Merge pull request #12564 from thalesmg/bw-support-batch-list-resp-m-20240221
feat(resource): allow `on_batch_query{,_async}` to return a list of individual results
2024-02-23 09:37:42 -03:00
Zaiming (Stone) Shi 46877e979b chore: update copyright-year 2024-02-23 08:21:06 +01:00
Thales Macedo Garitezi d003f77021 feat(resource): allow `on_batch_query{,_async}` to return a list of individual results
Fixes https://emqx.atlassian.net/browse/EMQX-11892

This allows callers of batching resources to receive results specific to their requests,
rather than a broad success or failure for the whole batch.
2024-02-22 16:26:02 -03:00
Thales Macedo Garitezi 13746c2cdf fix(resource): check status when (re)starting a resource
Fixes https://emqx.atlassian.net/browse/EMQX-10290
2023-06-19 18:01:02 -03:00
Serge Tupchii b5eda9f0d1 perf(emqx_resource): don't reactivate alarms on reoccurring errors
Avoid unnecessary calls to activate an alarm if it has been already activated.

Fixes: EMQX-9529/#10357
2023-04-20 16:37:33 +03:00
Thales Macedo Garitezi cb995e2033 fix(buffer_worker): avoid sending late reply messages to callers
Fixes https://emqx.atlassian.net/browse/EMQX-9635

During a sync call from process `A` to a buffer worker `B`, its call
to the underlying resource `C` can be very slow.  In those cases, `A`
will receive a timeout response and expect no more messages from `B`
nor `C`.  However, prior to this fix, if `B` is stuck in a long sync
call to `C` and then gets its response after `A` timed out, `B` would
still send the late response to `A`, polluting its mailbox.
2023-04-19 18:27:10 -03:00
Thales Macedo Garitezi f8d5d53908 feat(buffer_worker): decouple query mode from underlying connector call mode
Fixes https://emqx.atlassian.net/browse/EMQX-9129

Currently, if an user configures a bridge with query mode sync, then
all calls to the underlying driver/connector ("inner calls") will
always be synchronous, regardless of its support for async calls.

Since buffer workers always support async queries ("outer calls"), we
should decouple those two call modes (inner and outer), and avoid
exposing the inner call configuration to user to avoid complexity.

There are two situations when we want to force synchronous calls to
the underlying connector even if it supports async:

1) When using `simple_sync_query`, since we are bypassing the buffer
workers;
2) When retrying the inflight window, to avoid overwhelming the
driver.
2023-03-23 13:40:31 -03:00
Andrew Mayorov e411c5d5f8
refactor(resman): work with state cache atomically
Also ensure that cache entries are always consistent with `Data`,
so that most of the code could rely on reading the cached entry
most of the time.
2023-03-15 19:17:30 +03:00
Zaiming (Stone) Shi c97d17cc91 test: refactor to loop wait for counters 2023-02-24 09:02:03 +01:00
Zaiming (Stone) Shi 3a6dbbdd05 refactor(buffer_worker): ensure flsh message is never missed 2023-02-23 20:11:00 +01:00
Zaiming (Stone) Shi dbfdeec5e9 fix(buffer_worker): log unknown async replies 2023-02-23 12:55:49 +01:00
Andrew Mayorov d8d06a260f
test(buffer): add test on inflight overflow w/ async queries
This testcase should verify that the buffer will retry all inflight
queries failed with recoverable errors + flush all outstanding queries.

Co-authored-by: ieQu1 <99872536+ieQu1@users.noreply.github.com>
2023-02-08 14:08:04 +03:00
Thales Macedo Garitezi 6fa6c679bb feat(buffer_worker): add expiration time to requests
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.

Also introduces two new counters:

- `dropped.expired` :: happens when a request expires before being
  sent downstream
- `late_reply` :: when a response is receive from downstream, but the
  caller is no longer for a reply because the request has expired, and
  the caller might even have retried it.
2023-01-20 11:36:52 -03:00
Thales Macedo Garitezi fa01deb3eb chore: retry as much as possible, don't reply to caller too soon 2023-01-17 16:49:15 -03:00
Thales Macedo Garitezi 006b4bda97 feat(buffer_worker): monitor async workers and cancel their inflight requests upon death 2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi 731ac6567a fix(buffer_worker): don't retry all kinds of inflight requests
Some requests should not be retried during the blocked state.  For
example, if some async requests are just taking some time to process,
we should avoid retrying them periodically, lest risk overloading the
downstream further.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi c383558467 fix(buffer): fix `replayq` usages in buffer workers (5.0)
https://emqx.atlassian.net/browse/EMQX-8700

Fixes a few errors in the usage of `replayq` queues.

- Close `replayq` when `emqx_resource_worker` terminates.
- Do not keep old references to `replayq` after any `pop`s.
- Clear `replayq`'s data directories when removing a resource.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi fd360ac6c0 feat(buffer_worker): refactor buffer/resource workers to always use queue
This makes the buffer/resource workers always use `replayq` for
queuing, along with collecting multiple requests in a single call.
This is done to avoid long message queues for the buffer workers and
rely on `replayq`'s capabilities of offloading to disk and detecting
overflow.

Also, this deprecates the `enable_batch` and `enable_queue` resource
creation options, as: i) queuing is now always enables; ii) batch_size
> 1 <=> batch_enabled.  The corresponding metric
`dropped.queue_not_enabled` is dropped, along with `batching`.  The
batching is too ephemeral, especially considering a default batch time
of 20 ms, and is not shown in the dashboard, so it was removed.
2023-01-05 10:15:09 -03:00
Zaiming (Stone) Shi dbc10c2eed chore: update copyright year 2023 2023-01-02 09:22:27 +01:00
Thales Macedo Garitezi f0ff32c031 test: fix tests after counter changes 2022-10-11 17:45:48 -03:00
Shawn 9aa7e826cb refactor(resource): fast resume resource worker if inflight msgs are ACKed 2022-09-17 00:34:30 +08:00
Shawn 26234d38b9 fix: mark the async msg 'queuing' not 'sent.inflight' on recoverable_error 2022-09-02 18:41:43 +08:00
Shawn 6b0ccfbc43 refactor: rename the error return resource_down -> recoverable_error 2022-08-26 17:11:12 +08:00
Shawn 6203a01320 feat: add inflight window to emqx_resource 2022-08-11 08:36:35 +08:00
Shawn 82550a585a fix: add test cases for query async 2022-08-10 00:45:34 +08:00
Shawn efd6c56dd9 fix: test cases for batch query sync 2022-08-10 00:45:34 +08:00
Shawn 35fe70b887 feat: support aysnc callback to connector modules 2022-08-10 00:34:35 +08:00
Shawn a2afdeeb48 feat: add test cases for batching query 2022-08-10 00:34:35 +08:00