Commit Graph

130 Commits

Author SHA1 Message Date
Andrew Mayorov d8d06a260f
test(buffer): add test on inflight overflow w/ async queries
This testcase should verify that the buffer will retry all inflight
queries failed with recoverable errors + flush all outstanding queries.

Co-authored-by: ieQu1 <99872536+ieQu1@users.noreply.github.com>
2023-02-08 14:08:04 +03:00
Zaiming (Stone) Shi 13ef30c46c
Merge pull request #9884 from savonarola/resource-fixes
fix(resources): fix resource lifecycle
2023-02-02 12:02:34 +01:00
Ilya Averyanov 14f528cc86 fix(resources): fix resource lifecycle
* do not resume all buffer workers on successful healthcheck
* do not pass undefined state to resource healthcheck callback
2023-02-01 18:26:13 +02:00
Andrew Mayorov 5fd7f65a1f
test(bufworker): make testcase simpler to follow
The confusion was due to the fact that subsequent query was missing
`async_reply_fun` and thus, was not accumulating in the results.
2023-02-01 16:52:47 +03:00
Andrew Mayorov ff473e0f1b
test(bufworker): fix testcase flapping due to data races 2023-02-01 12:57:46 +03:00
Zaiming (Stone) Shi b3ad9e97d2
Merge pull request #9870 from keynslug/fix/mqtt-connection-loss-feedback
feat(mqtt-bridge): avoid middleman process
2023-01-31 19:12:18 +01:00
Andrew Mayorov c76311c9c3
fix(buffer): count inflight batches properly 2023-01-31 18:30:42 +03:00
Zaiming (Stone) Shi d47941601d refactor(buffer_worker): rename trace points 2023-01-28 11:52:11 +01:00
Zaiming (Stone) Shi fc38ea9571 refactor(buffer_worker): do not keep request body in reply context
the request body can be potentially very large
the reply context is sent to the async call handler and kept
in its memory until the async reply is received from bridge
target service.

this commit tries to minimize the size of the reply context
by replacing the request body with `[]`.
2023-01-27 17:12:55 +01:00
Stefan Strigler 2d62de5188 test: fix expected result from timeout error 2023-01-27 11:43:48 +01:00
Zaiming (Stone) Shi 1f799dfd59 fix: reply with {error, buffer_overflow} when discarded 2023-01-26 17:15:36 +01:00
Thales Macedo Garitezi 6fa6c679bb feat(buffer_worker): add expiration time to requests
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.

Also introduces two new counters:

- `dropped.expired` :: happens when a request expires before being
  sent downstream
- `late_reply` :: when a response is receive from downstream, but the
  caller is no longer for a reply because the request has expired, and
  the caller might even have retried it.
2023-01-20 11:36:52 -03:00
Thales Macedo Garitezi 47f796dd12 refactor: rename `emqx_resource_worker` -> `emqx_resource_buffer_worker`
To make it more clear that it's purpose is serve as a buffering layer.
2023-01-18 16:15:34 -03:00
Thales Macedo Garitezi 5c2ac0ac81 chore: don't cancel inflight items upon worker death; retry them 2023-01-17 19:50:30 -03:00
Thales Macedo Garitezi fa01deb3eb chore: retry as much as possible, don't reply to caller too soon 2023-01-17 16:49:15 -03:00
Thales Macedo Garitezi b5aaef084c refactor: enter running state directly
now that we don't have the possibility of dirty disk queues (we always
use volatile replayq), we will never resume old work.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi 006b4bda97 feat(buffer_worker): monitor async workers and cancel their inflight requests upon death 2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi 731ac6567a fix(buffer_worker): don't retry all kinds of inflight requests
Some requests should not be retried during the blocked state.  For
example, if some async requests are just taking some time to process,
we should avoid retrying them periodically, lest risk overloading the
downstream further.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi 5dd24a64c3 refactor(buffer_worker): check if inflight is full before flushing 2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi 81fc561ed5 fix(buffer_worker): check for overflow after enqueuing new requests 2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi 32a9e60313 feat(buffer_worker): also use the inflight table for sync requests
Related: https://emqx.atlassian.net/browse/EMQX-8692

This should also correctly account for `retried.*` metrics for sync
requests.

Also fixes cases where race conditions for retrying async requests
could potentially lead to inconsistent metrics.

Fixes more cases where a stale reference to `replayq` was being held
accidentally after a `pop`.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi c383558467 fix(buffer): fix `replayq` usages in buffer workers (5.0)
https://emqx.atlassian.net/browse/EMQX-8700

Fixes a few errors in the usage of `replayq` queues.

- Close `replayq` when `emqx_resource_worker` terminates.
- Do not keep old references to `replayq` after any `pop`s.
- Clear `replayq`'s data directories when removing a resource.
2023-01-17 16:48:48 -03:00
Kjell Winblad 734e6b9c96 chore: fix flaky test cases, log labels and review comments
Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>
2023-01-13 11:05:02 +01:00
Thales Macedo Garitezi fd360ac6c0 feat(buffer_worker): refactor buffer/resource workers to always use queue
This makes the buffer/resource workers always use `replayq` for
queuing, along with collecting multiple requests in a single call.
This is done to avoid long message queues for the buffer workers and
rely on `replayq`'s capabilities of offloading to disk and detecting
overflow.

Also, this deprecates the `enable_batch` and `enable_queue` resource
creation options, as: i) queuing is now always enables; ii) batch_size
> 1 <=> batch_enabled.  The corresponding metric
`dropped.queue_not_enabled` is dropped, along with `batching`.  The
batching is too ephemeral, especially considering a default batch time
of 20 ms, and is not shown in the dashboard, so it was removed.
2023-01-05 10:15:09 -03:00
Thales Macedo Garitezi 7e02eac3bc
Merge pull request #9619 from thalesmg/refactor-gauges-v50
refactor(metrics): use absolute gauge values rather than deltas (v5.0)
2023-01-02 10:56:47 -03:00
Zaiming (Stone) Shi dbc10c2eed chore: update copyright year 2023 2023-01-02 09:22:27 +01:00
Thales Macedo Garitezi 8b060a75f1 refactor(metrics): use absolute gauge values rather than deltas
https://emqx.atlassian.net/browse/EMQX-8548

Currently, we face several issues trying to keep resource metrics
reasonable.  For example, when a resource is re-created and has its
metrics reset, but then its durable queue resumes its previous work
and leads to strange (often negative) metrics.

Instead using `counters` that are shared by more than one worker to
manage gauges, we introduce an ETS table whose key is not only scoped
by the Resource ID as before, but also by the worker ID.  This way,
when a worker starts/terminates, they should set their own gauges to
their values (often 0 or `replayq:count` when resuming off a queue).
With this scoping and initialization procedure, we'll hopefully avoid
hitting those strange metrics scenarios and have better control over
the gauges.
2022-12-30 16:51:24 -03:00
Thales Macedo Garitezi 62eeb4b8e8 feat(resource): reset metrics when stopping a resource 2022-10-18 09:32:35 -03:00
Thales Macedo Garitezi f0ff32c031 test: fix tests after counter changes 2022-10-11 17:45:48 -03:00
Shawn 9aa7e826cb refactor(resource): fast resume resource worker if inflight msgs are ACKed 2022-09-17 00:34:30 +08:00
Shawn 8307f04c2e refactor(resource): save inflight size into the ETS table 2022-09-16 16:52:08 +08:00
Shawn b9ae4ea276 refactor: rename some metrics for emqx_resource 2022-09-13 14:04:25 +08:00
Shawn 26234d38b9 fix: mark the async msg 'queuing' not 'sent.inflight' on recoverable_error 2022-09-02 18:41:43 +08:00
Shawn 73e19d84ee feat: use the new metrics to bridge APIs 2022-08-30 23:47:58 +08:00
Shawn 6b0ccfbc43 refactor: rename the error return resource_down -> recoverable_error 2022-08-26 17:11:12 +08:00
Shawn 86577365e4 fix: use gen_statem:cast/3 for async query 2022-08-23 22:41:45 +08:00
JimMoen 22a4ca311c feat(resource): resource batch/async/queue config schema 2022-08-11 16:59:18 +08:00
Shawn 6203a01320 feat: add inflight window to emqx_resource 2022-08-11 08:36:35 +08:00
Shawn 82550a585a fix: add test cases for query async 2022-08-10 00:45:34 +08:00
Shawn efd6c56dd9 fix: test cases for batch query sync 2022-08-10 00:45:34 +08:00
Shawn 35fe70b887 feat: support aysnc callback to connector modules 2022-08-10 00:34:35 +08:00
Shawn f1419d52f1 fix(resource): remove resource at the end of each test 2022-08-10 00:34:35 +08:00
Shawn a2afdeeb48 feat: add test cases for batching query 2022-08-10 00:34:35 +08:00
Shawn d3950b9534 fix(resource): make option 'queue_enabled' disabled by default 2022-08-10 00:34:35 +08:00
Shawn 2fb42e4d37 refactor: create emqx_resource_worker_sup for resource workers 2022-08-10 00:34:35 +08:00
Shawn d6ef2f7502 refactor: graceful recreate resources 2022-06-17 05:29:18 +08:00
Shawn cc25f92273 feat: add start_after_created option to resource:create/4 2022-06-16 23:34:52 +08:00
Shawn 88ca25c60c fix(resource): fast return when starting a unavailable resource 2022-06-01 08:24:53 +08:00
Shawn d37a66e9b8 fix(test): update test cases for emqx_resource:health_check/1 2022-05-31 10:14:37 +08:00
Shawn 1054c364ad refactor(resource): improve health check and alarm it if resource down 2022-05-31 01:40:40 +08:00
EMQ-YangM 574a40b327 fix: wait for test_resource stop 2022-05-16 17:00:42 +08:00
Chris 6574c33797 feat: add auto_retry for disconnected state in resource manager 2022-05-13 11:19:39 +02:00
Chris 0b3e30e813 feat: isolate resource manager processes 2022-05-09 13:24:34 +02:00
DDDHuang 132b37813c refactor: code format emqx_connector emqx_resource 2022-04-28 15:32:47 +08:00
DDDHuang 2a2308bbf8 refactor: resource check & connector status 2022-04-28 15:32:35 +08:00
Zaiming (Stone) Shi 02c3f87b31 style: reformat all remaining apps 2022-04-27 15:51:18 +02:00
Zaiming (Stone) Shi f42a5b90df Revert "feat: isolate resource manager processes"
This reverts commit 40cca58d4f.
2022-04-26 16:13:38 +02:00
Chris 40cca58d4f feat: isolate resource manager processes 2022-04-26 13:28:29 +02:00
Ilya Averyanov e5f04f3bf7 chore(emqx_authn_jwt): wrap JWKS connector into emqx_resourse 2022-04-18 15:47:33 +03:00
EMQ-YangM 8f06a9ec62 feat: impl resource reset_metrics 2022-04-11 10:25:48 +08:00
EMQ-YangM db0e9e3358 fix(emqx_resource_instance): fix dialyzer warning 2022-03-08 14:09:39 +08:00
EMQ-YangM f29877bb6a fix(emqx_resource): remove create_opts async_create 2022-03-08 14:09:39 +08:00
Xinyu Liu 47a4fa5732
Merge pull request #7140 from EMQ-YangM/tmp_change_status
refactor(emqx_resource): change the status of emqx_resource to 'conne…
2022-02-28 11:13:47 +08:00
EMQ-YangM 376c9ee261 refactor(emqx_resource): change the status of emqx_resource to 'connected/connecting/disconnecting' 2022-02-25 15:02:41 +08:00
Zhongwen Deng db584f79d6 feat: upgrade hocon to 0.25.0 to replace nullable with required. 2022-02-24 22:39:03 +08:00
EMQ-YangM 48942f9c93 refactor(emqx_resource): move unused macro to test 2022-02-14 17:40:39 +08:00
EMQ-YangM df57daaabb refactor(emqx_resource): improve grouping strategy for emqx_resource_instance 2022-02-11 18:36:55 +08:00
EMQ-YangM 8cfbdc2730 test(emqx_resource): improve emqx_resource test coverage to 80% 2022-01-25 17:59:29 +08:00
EMQ-YangM d312f315ac test(emqx_resource_health_check): add more test to
health_check_timeout_checker
2022-01-25 15:07:54 +08:00
EMQ-YangM cb9f14f658 feat(emqx_resource_health_check): add timeout params to health_check_timeout_checker 2022-01-25 14:54:40 +08:00
EMQ-YangM 127384a9ae test(emqx_resource_SUITE): add more test 2022-01-25 14:39:35 +08:00
Yang Miao b528862c67
Merge branch 'master' into health_check_timeout 2022-01-24 14:48:55 +08:00
EMQ-YangM c870a2c78c test(emqx_resource_health_check): add async_create to create_local 2022-01-24 14:24:31 +08:00
Ilya Averyanov acc4ad0542 fix(emqx_resource): fix resource leakage 2022-01-21 22:50:30 +03:00
Zaiming (Stone) Shi 63167cea70 chore: update copyright 2022-01-05 20:55:00 +01:00
Shawn efec4564f0 fix(resource): update test cases on resource not_found 2021-12-31 22:25:45 +08:00
Shawn 657ecef67b fix(resource): don't crash on resource stopped 2021-12-31 20:57:34 +08:00
Shawn a879ec0f3a feat(resource): add option 'force_create' to emqx_resource:create/4 2021-12-20 10:26:27 +08:00
Shawn 46838a08cc fix(resource): update testcases for after_query functions 2021-11-23 10:41:45 +08:00
Ilya Averyanov 071c2c99e8 refactor(authn resources): add `emqx_resource` and `emqx_authn` tests 2021-11-22 21:08:04 +03:00