Fixes https://emqx.atlassian.net/browse/EMQX-9635
During a sync call from process `A` to a buffer worker `B`, its call
to the underlying resource `C` can be very slow. In those cases, `A`
will receive a timeout response and expect no more messages from `B`
nor `C`. However, prior to this fix, if `B` is stuck in a long sync
call to `C` and then gets its response after `A` timed out, `B` would
still send the late response to `A`, polluting its mailbox.
Fixes https://emqx.atlassian.net/browse/EMQX-9129
Currently, if an user configures a bridge with query mode sync, then
all calls to the underlying driver/connector ("inner calls") will
always be synchronous, regardless of its support for async calls.
Since buffer workers always support async queries ("outer calls"), we
should decouple those two call modes (inner and outer), and avoid
exposing the inner call configuration to user to avoid complexity.
There are two situations when we want to force synchronous calls to
the underlying connector even if it supports async:
1) When using `simple_sync_query`, since we are bypassing the buffer
workers;
2) When retrying the inflight window, to avoid overwhelming the
driver.
This testcase should verify that the buffer will retry all inflight
queries failed with recoverable errors + flush all outstanding queries.
Co-authored-by: ieQu1 <99872536+ieQu1@users.noreply.github.com>
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.
Also introduces two new counters:
- `dropped.expired` :: happens when a request expires before being
sent downstream
- `late_reply` :: when a response is receive from downstream, but the
caller is no longer for a reply because the request has expired, and
the caller might even have retried it.
Some requests should not be retried during the blocked state. For
example, if some async requests are just taking some time to process,
we should avoid retrying them periodically, lest risk overloading the
downstream further.
https://emqx.atlassian.net/browse/EMQX-8700
Fixes a few errors in the usage of `replayq` queues.
- Close `replayq` when `emqx_resource_worker` terminates.
- Do not keep old references to `replayq` after any `pop`s.
- Clear `replayq`'s data directories when removing a resource.
This makes the buffer/resource workers always use `replayq` for
queuing, along with collecting multiple requests in a single call.
This is done to avoid long message queues for the buffer workers and
rely on `replayq`'s capabilities of offloading to disk and detecting
overflow.
Also, this deprecates the `enable_batch` and `enable_queue` resource
creation options, as: i) queuing is now always enables; ii) batch_size
> 1 <=> batch_enabled. The corresponding metric
`dropped.queue_not_enabled` is dropped, along with `batching`. The
batching is too ephemeral, especially considering a default batch time
of 20 ms, and is not shown in the dashboard, so it was removed.