Thales Macedo Garitezi
422597a441
test: fix flaky tests
2023-03-14 16:08:47 -03:00
Andrew Mayorov
c883e4b36a
test: drop custom `loop_wait` in favor of snabkaffe's `?retry`
2023-02-24 18:16:35 +03:00
Zaiming (Stone) Shi
c97d17cc91
test: refactor to loop wait for counters
2023-02-24 09:02:03 +01:00
Zaiming (Stone) Shi
7a6465e2cf
fix(buffer_worker): ensure flush timer reset in blocked state
2023-02-23 21:06:38 +01:00
Zaiming (Stone) Shi
3a6dbbdd05
refactor(buffer_worker): ensure flsh message is never missed
2023-02-23 20:11:00 +01:00
Zaiming (Stone) Shi
dbfdeec5e9
fix(buffer_worker): log unknown async replies
2023-02-23 12:55:49 +01:00
Zaiming (Stone) Shi
036f69cd6e
test: ensure batch size > 1 is covered in expiration test
2023-02-22 23:26:04 +01:00
Zaiming (Stone) Shi
bf8becd521
test: make sure gauge return to 0 in test cases
2023-02-22 23:07:12 +01:00
Zaiming (Stone) Shi
fc614e16e5
fix(bridge): update inflight items after partial expiry
2023-02-22 22:05:56 +01:00
Erik Timan
2442a4dea7
test(emqx_resource): add regression test for recursive flushing
2023-02-16 14:17:16 +01:00
Andrew Mayorov
d8d06a260f
test(buffer): add test on inflight overflow w/ async queries
...
This testcase should verify that the buffer will retry all inflight
queries failed with recoverable errors + flush all outstanding queries.
Co-authored-by: ieQu1 <99872536+ieQu1@users.noreply.github.com>
2023-02-08 14:08:04 +03:00
Zaiming (Stone) Shi
13ef30c46c
Merge pull request #9884 from savonarola/resource-fixes
...
fix(resources): fix resource lifecycle
2023-02-02 12:02:34 +01:00
Ilya Averyanov
14f528cc86
fix(resources): fix resource lifecycle
...
* do not resume all buffer workers on successful healthcheck
* do not pass undefined state to resource healthcheck callback
2023-02-01 18:26:13 +02:00
Andrew Mayorov
5fd7f65a1f
test(bufworker): make testcase simpler to follow
...
The confusion was due to the fact that subsequent query was missing
`async_reply_fun` and thus, was not accumulating in the results.
2023-02-01 16:52:47 +03:00
Andrew Mayorov
ff473e0f1b
test(bufworker): fix testcase flapping due to data races
2023-02-01 12:57:46 +03:00
Zaiming (Stone) Shi
b3ad9e97d2
Merge pull request #9870 from keynslug/fix/mqtt-connection-loss-feedback
...
feat(mqtt-bridge): avoid middleman process
2023-01-31 19:12:18 +01:00
Andrew Mayorov
c76311c9c3
fix(buffer): count inflight batches properly
2023-01-31 18:30:42 +03:00
Zaiming (Stone) Shi
d47941601d
refactor(buffer_worker): rename trace points
2023-01-28 11:52:11 +01:00
Zaiming (Stone) Shi
fc38ea9571
refactor(buffer_worker): do not keep request body in reply context
...
the request body can be potentially very large
the reply context is sent to the async call handler and kept
in its memory until the async reply is received from bridge
target service.
this commit tries to minimize the size of the reply context
by replacing the request body with `[]`.
2023-01-27 17:12:55 +01:00
Stefan Strigler
2d62de5188
test: fix expected result from timeout error
2023-01-27 11:43:48 +01:00
Zaiming (Stone) Shi
1f799dfd59
fix: reply with {error, buffer_overflow} when discarded
2023-01-26 17:15:36 +01:00
Thales Macedo Garitezi
6fa6c679bb
feat(buffer_worker): add expiration time to requests
...
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.
Also introduces two new counters:
- `dropped.expired` :: happens when a request expires before being
sent downstream
- `late_reply` :: when a response is receive from downstream, but the
caller is no longer for a reply because the request has expired, and
the caller might even have retried it.
2023-01-20 11:36:52 -03:00
Thales Macedo Garitezi
47f796dd12
refactor: rename `emqx_resource_worker` -> `emqx_resource_buffer_worker`
...
To make it more clear that it's purpose is serve as a buffering layer.
2023-01-18 16:15:34 -03:00
Thales Macedo Garitezi
5c2ac0ac81
chore: don't cancel inflight items upon worker death; retry them
2023-01-17 19:50:30 -03:00
Thales Macedo Garitezi
fa01deb3eb
chore: retry as much as possible, don't reply to caller too soon
2023-01-17 16:49:15 -03:00
Thales Macedo Garitezi
b5aaef084c
refactor: enter running state directly
...
now that we don't have the possibility of dirty disk queues (we always
use volatile replayq), we will never resume old work.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
006b4bda97
feat(buffer_worker): monitor async workers and cancel their inflight requests upon death
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
731ac6567a
fix(buffer_worker): don't retry all kinds of inflight requests
...
Some requests should not be retried during the blocked state. For
example, if some async requests are just taking some time to process,
we should avoid retrying them periodically, lest risk overloading the
downstream further.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
5dd24a64c3
refactor(buffer_worker): check if inflight is full before flushing
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
81fc561ed5
fix(buffer_worker): check for overflow after enqueuing new requests
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
32a9e60313
feat(buffer_worker): also use the inflight table for sync requests
...
Related: https://emqx.atlassian.net/browse/EMQX-8692
This should also correctly account for `retried.*` metrics for sync
requests.
Also fixes cases where race conditions for retrying async requests
could potentially lead to inconsistent metrics.
Fixes more cases where a stale reference to `replayq` was being held
accidentally after a `pop`.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
c383558467
fix(buffer): fix `replayq` usages in buffer workers (5.0)
...
https://emqx.atlassian.net/browse/EMQX-8700
Fixes a few errors in the usage of `replayq` queues.
- Close `replayq` when `emqx_resource_worker` terminates.
- Do not keep old references to `replayq` after any `pop`s.
- Clear `replayq`'s data directories when removing a resource.
2023-01-17 16:48:48 -03:00
Kjell Winblad
734e6b9c96
chore: fix flaky test cases, log labels and review comments
...
Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>
2023-01-13 11:05:02 +01:00
Thales Macedo Garitezi
fd360ac6c0
feat(buffer_worker): refactor buffer/resource workers to always use queue
...
This makes the buffer/resource workers always use `replayq` for
queuing, along with collecting multiple requests in a single call.
This is done to avoid long message queues for the buffer workers and
rely on `replayq`'s capabilities of offloading to disk and detecting
overflow.
Also, this deprecates the `enable_batch` and `enable_queue` resource
creation options, as: i) queuing is now always enables; ii) batch_size
> 1 <=> batch_enabled. The corresponding metric
`dropped.queue_not_enabled` is dropped, along with `batching`. The
batching is too ephemeral, especially considering a default batch time
of 20 ms, and is not shown in the dashboard, so it was removed.
2023-01-05 10:15:09 -03:00
Thales Macedo Garitezi
7e02eac3bc
Merge pull request #9619 from thalesmg/refactor-gauges-v50
...
refactor(metrics): use absolute gauge values rather than deltas (v5.0)
2023-01-02 10:56:47 -03:00
Zaiming (Stone) Shi
dbc10c2eed
chore: update copyright year 2023
2023-01-02 09:22:27 +01:00
Thales Macedo Garitezi
8b060a75f1
refactor(metrics): use absolute gauge values rather than deltas
...
https://emqx.atlassian.net/browse/EMQX-8548
Currently, we face several issues trying to keep resource metrics
reasonable. For example, when a resource is re-created and has its
metrics reset, but then its durable queue resumes its previous work
and leads to strange (often negative) metrics.
Instead using `counters` that are shared by more than one worker to
manage gauges, we introduce an ETS table whose key is not only scoped
by the Resource ID as before, but also by the worker ID. This way,
when a worker starts/terminates, they should set their own gauges to
their values (often 0 or `replayq:count` when resuming off a queue).
With this scoping and initialization procedure, we'll hopefully avoid
hitting those strange metrics scenarios and have better control over
the gauges.
2022-12-30 16:51:24 -03:00
Thales Macedo Garitezi
62eeb4b8e8
feat(resource): reset metrics when stopping a resource
2022-10-18 09:32:35 -03:00
Thales Macedo Garitezi
f0ff32c031
test: fix tests after counter changes
2022-10-11 17:45:48 -03:00
Shawn
9aa7e826cb
refactor(resource): fast resume resource worker if inflight msgs are ACKed
2022-09-17 00:34:30 +08:00
Shawn
8307f04c2e
refactor(resource): save inflight size into the ETS table
2022-09-16 16:52:08 +08:00
Shawn
b9ae4ea276
refactor: rename some metrics for emqx_resource
2022-09-13 14:04:25 +08:00
Shawn
26234d38b9
fix: mark the async msg 'queuing' not 'sent.inflight' on recoverable_error
2022-09-02 18:41:43 +08:00
Shawn
73e19d84ee
feat: use the new metrics to bridge APIs
2022-08-30 23:47:58 +08:00
Shawn
6b0ccfbc43
refactor: rename the error return resource_down -> recoverable_error
2022-08-26 17:11:12 +08:00
Shawn
86577365e4
fix: use gen_statem:cast/3 for async query
2022-08-23 22:41:45 +08:00
JimMoen
22a4ca311c
feat(resource): resource batch/async/queue config schema
2022-08-11 16:59:18 +08:00
Shawn
6203a01320
feat: add inflight window to emqx_resource
2022-08-11 08:36:35 +08:00
Shawn
82550a585a
fix: add test cases for query async
2022-08-10 00:45:34 +08:00
Shawn
efd6c56dd9
fix: test cases for batch query sync
2022-08-10 00:45:34 +08:00
Shawn
35fe70b887
feat: support aysnc callback to connector modules
2022-08-10 00:34:35 +08:00
Shawn
f1419d52f1
fix(resource): remove resource at the end of each test
2022-08-10 00:34:35 +08:00
Shawn
a2afdeeb48
feat: add test cases for batching query
2022-08-10 00:34:35 +08:00
Shawn
d3950b9534
fix(resource): make option 'queue_enabled' disabled by default
2022-08-10 00:34:35 +08:00
Shawn
2fb42e4d37
refactor: create emqx_resource_worker_sup for resource workers
2022-08-10 00:34:35 +08:00
Shawn
d6ef2f7502
refactor: graceful recreate resources
2022-06-17 05:29:18 +08:00
Shawn
cc25f92273
feat: add start_after_created option to resource:create/4
2022-06-16 23:34:52 +08:00
Shawn
88ca25c60c
fix(resource): fast return when starting a unavailable resource
2022-06-01 08:24:53 +08:00
Shawn
d37a66e9b8
fix(test): update test cases for emqx_resource:health_check/1
2022-05-31 10:14:37 +08:00
Shawn
1054c364ad
refactor(resource): improve health check and alarm it if resource down
2022-05-31 01:40:40 +08:00
EMQ-YangM
574a40b327
fix: wait for test_resource stop
2022-05-16 17:00:42 +08:00
Chris
6574c33797
feat: add auto_retry for disconnected state in resource manager
2022-05-13 11:19:39 +02:00
Chris
0b3e30e813
feat: isolate resource manager processes
2022-05-09 13:24:34 +02:00
DDDHuang
132b37813c
refactor: code format emqx_connector emqx_resource
2022-04-28 15:32:47 +08:00
DDDHuang
2a2308bbf8
refactor: resource check & connector status
2022-04-28 15:32:35 +08:00
Zaiming (Stone) Shi
02c3f87b31
style: reformat all remaining apps
2022-04-27 15:51:18 +02:00
Zaiming (Stone) Shi
f42a5b90df
Revert "feat: isolate resource manager processes"
...
This reverts commit 40cca58d4f
.
2022-04-26 16:13:38 +02:00
Chris
40cca58d4f
feat: isolate resource manager processes
2022-04-26 13:28:29 +02:00
Ilya Averyanov
e5f04f3bf7
chore(emqx_authn_jwt): wrap JWKS connector into emqx_resourse
2022-04-18 15:47:33 +03:00
EMQ-YangM
8f06a9ec62
feat: impl resource reset_metrics
2022-04-11 10:25:48 +08:00
EMQ-YangM
db0e9e3358
fix(emqx_resource_instance): fix dialyzer warning
2022-03-08 14:09:39 +08:00
EMQ-YangM
f29877bb6a
fix(emqx_resource): remove create_opts async_create
2022-03-08 14:09:39 +08:00
Xinyu Liu
47a4fa5732
Merge pull request #7140 from EMQ-YangM/tmp_change_status
...
refactor(emqx_resource): change the status of emqx_resource to 'conne…
2022-02-28 11:13:47 +08:00
EMQ-YangM
376c9ee261
refactor(emqx_resource): change the status of emqx_resource to 'connected/connecting/disconnecting'
2022-02-25 15:02:41 +08:00
Zhongwen Deng
db584f79d6
feat: upgrade hocon to 0.25.0 to replace nullable with required.
2022-02-24 22:39:03 +08:00
EMQ-YangM
48942f9c93
refactor(emqx_resource): move unused macro to test
2022-02-14 17:40:39 +08:00
EMQ-YangM
df57daaabb
refactor(emqx_resource): improve grouping strategy for emqx_resource_instance
2022-02-11 18:36:55 +08:00
EMQ-YangM
8cfbdc2730
test(emqx_resource): improve emqx_resource test coverage to 80%
2022-01-25 17:59:29 +08:00
EMQ-YangM
d312f315ac
test(emqx_resource_health_check): add more test to
...
health_check_timeout_checker
2022-01-25 15:07:54 +08:00
EMQ-YangM
cb9f14f658
feat(emqx_resource_health_check): add timeout params to health_check_timeout_checker
2022-01-25 14:54:40 +08:00
EMQ-YangM
127384a9ae
test(emqx_resource_SUITE): add more test
2022-01-25 14:39:35 +08:00
Yang Miao
b528862c67
Merge branch 'master' into health_check_timeout
2022-01-24 14:48:55 +08:00
EMQ-YangM
c870a2c78c
test(emqx_resource_health_check): add async_create to create_local
2022-01-24 14:24:31 +08:00
Ilya Averyanov
acc4ad0542
fix(emqx_resource): fix resource leakage
2022-01-21 22:50:30 +03:00
Zaiming (Stone) Shi
63167cea70
chore: update copyright
2022-01-05 20:55:00 +01:00
Shawn
efec4564f0
fix(resource): update test cases on resource not_found
2021-12-31 22:25:45 +08:00
Shawn
657ecef67b
fix(resource): don't crash on resource stopped
2021-12-31 20:57:34 +08:00
Shawn
a879ec0f3a
feat(resource): add option 'force_create' to emqx_resource:create/4
2021-12-20 10:26:27 +08:00
Shawn
46838a08cc
fix(resource): update testcases for after_query functions
2021-11-23 10:41:45 +08:00
Ilya Averyanov
071c2c99e8
refactor(authn resources): add `emqx_resource` and `emqx_authn` tests
2021-11-22 21:08:04 +03:00