some telemetry events from wolff are discarded:
* dropped:
this is double counted in wolff,
we now only subscribe to the dropped_queue_full event
* retried_failed:
it has different meanings in wolff,
in wolff, it means it's the 2nd (or onward) produce attempt
in EMQX, it means it's eventually failed after some retries
* retried_success
since we are going to handle the success counters in callbac
this having this reported from wolff will only make things
harder to understand
* failed
wolff never fails (unelss drop which is a different counter)
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.
Also introduces two new counters:
- `dropped.expired` :: happens when a request expires before being
sent downstream
- `late_reply` :: when a response is receive from downstream, but the
caller is no longer for a reply because the request has expired, and
the caller might even have retried it.
Related: https://emqx.atlassian.net/browse/EMQX-8692
This should also correctly account for `retried.*` metrics for sync
requests.
Also fixes cases where race conditions for retrying async requests
could potentially lead to inconsistent metrics.
Fixes more cases where a stale reference to `replayq` was being held
accidentally after a `pop`.
In order to improve the consistency with other API endpoints, we move
the enable/disable operations to a separate endpoint
/bridges/{id}/enable/[true,false].
In order for the /bridges APIs to be consistent with other APIs, we move
out metrics from GET /bridges/{id} to its own endpoint,
/bridges/{id}/metrics. We also rename /bridges/reset_metrics to
/bridges/metrics/reset.
This fixes https://emqx.atlassian.net/browse/EMQX-8648. The issue
described in `EMQX-8648` is that when deleting a non-existing bridge the
server gives a success response. See below:
```
curl --head -u admin:public2 -X 'DELETE' 'http://localhost:18083/api/v5/bridges/webhook:i_do_not_exist'
HTTP/1.1 204 No Content
date: Tue, 03 Jan 2023 16:59:01 GMT
server: Cowboy
```
After the fix, deleting a non existing bridge will give the following
response:
```
HTTP/1.1 404 Not Found
content-length: 49
content-type: application/json
date: Thu, 05 Jan 2023 12:40:35 GMT
server: Cowboy
```
Closes: EMQX-8648
This makes the buffer/resource workers always use `replayq` for
queuing, along with collecting multiple requests in a single call.
This is done to avoid long message queues for the buffer workers and
rely on `replayq`'s capabilities of offloading to disk and detecting
overflow.
Also, this deprecates the `enable_batch` and `enable_queue` resource
creation options, as: i) queuing is now always enables; ii) batch_size
> 1 <=> batch_enabled. The corresponding metric
`dropped.queue_not_enabled` is dropped, along with `batching`. The
batching is too ephemeral, especially considering a default batch time
of 20 ms, and is not shown in the dashboard, so it was removed.