Zaiming (Stone) Shi
fe27604010
Merge remote-tracking branch 'origin/release-50' into 0308-merge-release-50-back-to-master
2023-03-08 16:46:45 +01:00
Thales Macedo Garitezi
eef65fba60
fix(buffer_worker): handle `request_timeout = infinity` case
...
The current schema allows `infinity` for `request_timeout`, so we have
to take that into account. It's not currently possible to set
`batch_time = infinity`, so there's no need to treat that case.
2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi
18ab7ed197
chore: bump app vsns
2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi
0e707e837f
docs(buffer_worker): improve description of `request_timeout`
2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi
e9d3fc511f
chore(buffer_worker): change default `batch_time` to 0 and improve docs
2023-03-06 15:31:28 -03:00
Thales Macedo Garitezi
f95a30ae89
fix(webhook): convert `request_timeout`s in root and resource_opts
2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi
167b7a212f
refactor(buffer_worker): avoid starting 0-time timers
2023-03-06 10:12:38 -03:00
Thales Macedo Garitezi
e9ffabf936
fix(buffer_worker): add batch time automatic adjustment
...
To avoid message loss due to misconfigurations, we adjust `batch_time`
based on `request_timeout`. If `batch_time` > `request_timeout`, all
requests will timeout before being sent if the message rate is low.
Even worse if `pool_size` is high. We cap `batch_time` at
`request_timeout div 2` as a rule of thumb.
2023-03-06 10:12:38 -03:00
Kjell Winblad
67acdf0888
feat: add clickhouse database bridge
...
This commit adds a Clickhouse bridge to EMQX 5. The bridge is similar to
the Clickhouse bridge in the 4.4, but adds the possibility to use
different formats (such as JSON) for values to be inserted.
2023-03-02 12:22:11 +01:00
Andrew Mayorov
c883e4b36a
test: drop custom `loop_wait` in favor of snabkaffe's `?retry`
2023-02-24 18:16:35 +03:00
Andrew Mayorov
2b4e49e7df
fix(bufworker): handle replies of simple async queries
...
Before that change, simple queries were treated as "retries"
essentially, thus skipping all the reply processing there is.
2023-02-24 15:06:49 +03:00
Zaiming (Stone) Shi
c97d17cc91
test: refactor to loop wait for counters
2023-02-24 09:02:03 +01:00
Zaiming (Stone) Shi
a10dbba084
refactor(buffer_worker): less defensive on inflight counter decrement
2023-02-23 21:23:10 +01:00
Zaiming (Stone) Shi
7a6465e2cf
fix(buffer_worker): ensure flush timer reset in blocked state
2023-02-23 21:06:38 +01:00
Zaiming (Stone) Shi
3a6dbbdd05
refactor(buffer_worker): ensure flsh message is never missed
2023-02-23 20:11:00 +01:00
Zaiming (Stone) Shi
dbfdeec5e9
fix(buffer_worker): log unknown async replies
2023-02-23 12:55:49 +01:00
Zaiming (Stone) Shi
356a94af30
fix(buffer_worker): ensure async flush message is sent
...
This is a new issue introduced in the previous fix commits
after handling the partial expiry correctly, the
IsFullBefore check is no longer the state before the reply
is received but the state after a partially-expired batch
is shrinked.
The fix is simple, move the check to the entry-point of
where async reply callback enters, then send an async
'flush' notification regardless of the handling result.
2023-02-23 09:47:34 +01:00
Zaiming (Stone) Shi
713220f88b
refactor(buffer_worker): more generic process for all_expired
2023-02-23 00:04:20 +01:00
Zaiming (Stone) Shi
036f69cd6e
test: ensure batch size > 1 is covered in expiration test
2023-02-22 23:26:04 +01:00
Zaiming (Stone) Shi
bf8becd521
test: make sure gauge return to 0 in test cases
2023-02-22 23:07:12 +01:00
Zaiming (Stone) Shi
fc614e16e5
fix(bridge): update inflight items after partial expiry
2023-02-22 22:05:56 +01:00
Zaiming (Stone) Shi
bb13d0708f
fix(bridge): fix dropped counter and inflight gauge
...
Prior to this fix there were two metrics issues
1. if a batch is all requests expired when receiving a reply
it only bumped 1 instead of the batch size for 'late_reply'
2. when a batch is partially delivered (or expired), the
dropped requests were not decremented from the inflight size gauge
2023-02-22 13:20:58 +01:00
Erik Timan
056bc71af2
chore: bump VSN version
2023-02-16 15:05:38 +01:00
Erik Timan
2442a4dea7
test(emqx_resource): add regression test for recursive flushing
2023-02-16 14:17:16 +01:00
Erik Timan
dcf70e0e68
refactor(emqx_resource): add more trace points for flushing
2023-02-16 14:17:16 +01:00
Zaiming (Stone) Shi
fb61c2b266
perf: avoid getting metrics (gen_server:call) for each resource lookup
2023-02-10 19:40:37 +01:00
Zaiming (Stone) Shi
42dfaf3ef2
Merge pull request #9910 from sstrigler/EMQX-8861-improve-bridge-restart-button-behaviour
...
EMQX 8861 improve bridge restart button behaviour
2023-02-09 18:00:48 +01:00
Andrew Mayorov
81b1bab11e
chore: bump `emqx_resource` version to 0.1.7
...
Also add the changelog entry.
2023-02-08 14:21:30 +03:00
Andrew Mayorov
c6fc0ec8cd
fix(bufworker): do not avoid retry if inflight table is full
...
Otherwise there's no other piece of code that would retry the inflight
queries in that case.
2023-02-08 14:08:04 +03:00
Andrew Mayorov
d8d06a260f
test(buffer): add test on inflight overflow w/ async queries
...
This testcase should verify that the buffer will retry all inflight
queries failed with recoverable errors + flush all outstanding queries.
Co-authored-by: ieQu1 <99872536+ieQu1@users.noreply.github.com>
2023-02-08 14:08:04 +03:00
Stefan Strigler
86f3f5787f
feat: allow to manually re-connect disconected bridge
2023-02-07 11:58:30 +01:00
Zaiming (Stone) Shi
7ea140599a
Merge pull request #9894 from id/ci-always-run-static-checks
...
ci: always run static_checks
2023-02-02 16:33:19 +01:00
Zaiming (Stone) Shi
feca4cc0a5
Merge pull request #9892 from zmstone/0202-docs-cosmetic
...
0202 docs cosmetic
2023-02-02 15:43:58 +01:00
Zaiming (Stone) Shi
58627b7958
chore(emqx_resource_manager): ignore unused return value for dialyzer
2023-02-02 14:11:12 +01:00
Zaiming (Stone) Shi
c0d478bd41
fix(buffer_worker): type spec
2023-02-02 14:11:12 +01:00
zhongwencool
ee852d8204
Merge pull request #9886 from zhongwencool/mongo-connection-default-async
...
fix: remove async mode from mongodb/redis/mysql/pgsql bridge
2023-02-02 21:08:01 +08:00
Zaiming (Stone) Shi
d5c482b0b0
docs: remove timer unit from description
...
the user input has time unit. e.g. "5s" for 5 seconds etc.
2023-02-02 13:49:20 +01:00
Zaiming (Stone) Shi
9864587389
fix: send to buffer-supported connector even when disconnected
2023-02-02 12:04:17 +01:00
Zaiming (Stone) Shi
13ef30c46c
Merge pull request #9884 from savonarola/resource-fixes
...
fix(resources): fix resource lifecycle
2023-02-02 12:02:34 +01:00
Zhongwen Deng
1c9035d24c
test: remove async from redis ct
2023-02-02 17:37:18 +08:00
Zhongwen Deng
22cc1cc745
fix: make spell_check happy
2023-02-02 17:37:18 +08:00
Zhongwen Deng
f8936013b7
chore: replace async with sync
2023-02-02 17:37:18 +08:00
Zhongwen Deng
22c3f50020
fix: add query_mode_sync_only for mysql pgsql redis mongodb bridge
2023-02-02 17:37:18 +08:00
Andrew Mayorov
ca5c192f4b
Merge pull request #9882 from fwup/fix/no-mqtt-bridge-middleman
...
refactor(mqtt-worker): avoid unnecessary abstraction
2023-02-02 13:11:31 +04:00
Zhongwen Deng
1a90c1654c
chore: bad typo
2023-02-02 11:43:04 +08:00
Ilya Averyanov
14f528cc86
fix(resources): fix resource lifecycle
...
* do not resume all buffer workers on successful healthcheck
* do not pass undefined state to resource healthcheck callback
2023-02-01 18:26:13 +02:00
Andrew Mayorov
5fd7f65a1f
test(bufworker): make testcase simpler to follow
...
The confusion was due to the fact that subsequent query was missing
`async_reply_fun` and thus, was not accumulating in the results.
2023-02-01 16:52:47 +03:00
Andrew Mayorov
ff473e0f1b
test(bufworker): fix testcase flapping due to data races
2023-02-01 12:57:46 +03:00
Zaiming (Stone) Shi
b3ad9e97d2
Merge pull request #9870 from keynslug/fix/mqtt-connection-loss-feedback
...
feat(mqtt-bridge): avoid middleman process
2023-01-31 19:12:18 +01:00
Andrew Mayorov
c76311c9c3
fix(buffer): count inflight batches properly
2023-01-31 18:30:42 +03:00
Zaiming (Stone) Shi
b3e486041b
Merge pull request #9853 from zmstone/0127-refactor-buffer-worker-no-need-to-keep-request-for-reply-callback
...
0127 refactor buffer worker no need to keep request for reply callback
2023-01-31 08:44:01 +01:00
Stefan Strigler
27881064dc
fix: increase dropped.queue_full by number of messages
2023-01-30 11:37:35 +01:00
Zaiming (Stone) Shi
d47941601d
refactor(buffer_worker): rename trace points
2023-01-28 11:52:11 +01:00
Zaiming (Stone) Shi
7f66c6a9e2
Merge pull request #9840 from olcai/redact-influxdb-tokens
...
fix: redact influxdb tokens in logs and reduce log level
2023-01-28 11:47:36 +01:00
Zaiming (Stone) Shi
fc38ea9571
refactor(buffer_worker): do not keep request body in reply context
...
the request body can be potentially very large
the reply context is sent to the async call handler and kept
in its memory until the async reply is received from bridge
target service.
this commit tries to minimize the size of the reply context
by replacing the request body with `[]`.
2023-01-27 17:12:55 +01:00
Zaiming (Stone) Shi
578271ea3d
refactor: use lists:map instead of lc for safty
2023-01-27 15:15:46 +01:00
Zaiming (Stone) Shi
f793807bc1
refactor(buffer_worker): rename function
...
batch_reply_after_query to handle_async_batch_reply
2023-01-27 15:04:28 +01:00
Zaiming (Stone) Shi
262c3a2869
refactor(buffer_worker): rename function
...
from reply_after_query to handle_async_reply
2023-01-27 15:03:18 +01:00
Zaiming (Stone) Shi
52b75ada04
Merge pull request #9832 from sstrigler/EMQX-8774-failure-to-handle-timeout-error-in-resource-worker
...
EMQX 8774 failure to handle timeout error in resource worker
2023-01-27 14:36:44 +01:00
Zaiming (Stone) Shi
514609bcf7
Merge pull request #9850 from zmstone/0127-fix-influxdb-bridge-atom-leak
...
0127 fix influxdb bridge atom leak
2023-01-27 14:30:20 +01:00
Zaiming (Stone) Shi
d53106145f
fix: stop resource when resource manager terminates
2023-01-27 12:39:05 +01:00
Stefan Strigler
2d62de5188
test: fix expected result from timeout error
2023-01-27 11:43:48 +01:00
Stefan Strigler
a180bd9aa5
fix: catch error, not exit
2023-01-27 11:40:06 +01:00
Stefan Strigler
b7e3f9d5a6
fix: try-case-of rather than try-of
...
try-of catches only what happens within but not after
2023-01-27 11:40:06 +01:00
Zaiming (Stone) Shi
db2f631a8a
refactor(buffer_worker): simplify caller reply
2023-01-27 11:33:45 +01:00
Zaiming (Stone) Shi
d4fab92b72
refactor(buffer_worker): no need to keep request for REPLY macro
2023-01-27 10:41:30 +01:00
Zaiming (Stone) Shi
1f799dfd59
fix: reply with {error, buffer_overflow} when discarded
2023-01-26 17:15:36 +01:00
Zaiming (Stone) Shi
ed28789164
refactor(buffer_worker): no need to return after collect into buf queue
2023-01-26 14:50:40 +01:00
Zaiming (Stone) Shi
25b4821adc
refactor: move the the per-message overflow log from error to info level
2023-01-26 14:48:43 +01:00
Zaiming (Stone) Shi
bb26632c8a
fix(buffer_worker): fix a wrong assertion
...
the assertion is to ensure queue items are not binary
but should not assert the queue itself
2023-01-26 14:33:16 +01:00
Erik Timan
805d08e823
fix: reduce log level from error to warning in several places
...
This reduces the log level from error to warning in places that are
connected to the influxdb bridge. Transient errors for external
resources should not render an error log.
2023-01-25 14:49:50 +01:00
Zaiming (Stone) Shi
5fdf7fd24c
fix(kafka): use async callback to bump success counters
...
some telemetry events from wolff are discarded:
* dropped:
this is double counted in wolff,
we now only subscribe to the dropped_queue_full event
* retried_failed:
it has different meanings in wolff,
in wolff, it means it's the 2nd (or onward) produce attempt
in EMQX, it means it's eventually failed after some retries
* retried_success
since we are going to handle the success counters in callbac
this having this reported from wolff will only make things
harder to understand
* failed
wolff never fails (unelss drop which is a different counter)
2023-01-24 21:12:36 +01:00
Erik Timan
9d20431257
fix(emqx_resource): fix crash while flushing queue
...
We used next_event for flushing the queue in emqx_resource, but this
leads to a crash. We now call flush_worker/1 instead.
2023-01-24 14:13:35 +01:00
Erik Timan
28718edbfd
chore: bump application VSNs
2023-01-24 14:12:34 +01:00
Zaiming (Stone) Shi
8fde169abb
Merge pull request #9821 from thalesmg/buffer-worker-expiry-v50
...
feat(buffer_worker): add expiration time to requests
2023-01-24 13:54:04 +01:00
Thales Macedo Garitezi
ca4a262b75
refactor: re-organize dealing with unrecoverable errors
2023-01-20 12:00:17 -03:00
Thales Macedo Garitezi
6fa6c679bb
feat(buffer_worker): add expiration time to requests
...
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.
Also introduces two new counters:
- `dropped.expired` :: happens when a request expires before being
sent downstream
- `late_reply` :: when a response is receive from downstream, but the
caller is no longer for a reply because the request has expired, and
the caller might even have retried it.
2023-01-20 11:36:52 -03:00
Zaiming (Stone) Shi
1c3e055b13
Merge pull request #9822 from JimMoen/fix-schema-typo
...
chore: i18n typo fix
2023-01-20 11:11:18 +01:00
JimMoen
16f45a60fd
chore: i18n typo fix
2023-01-20 11:50:01 +08:00
Thales Macedo Garitezi
47f796dd12
refactor: rename `emqx_resource_worker` -> `emqx_resource_buffer_worker`
...
To make it more clear that it's purpose is serve as a buffering layer.
2023-01-18 16:15:34 -03:00
Ilya Averyanov
44a6e5ed15
chore(resources): add missing parameters to emqx_resource schema
2023-01-18 14:33:45 +02:00
Zaiming (Stone) Shi
d4f3b4c8c2
Merge remote-tracking branch 'origin/master' into fix-buffer-clear-replayq-on-delete-v50
2023-01-18 11:39:47 +01:00
Ivan Dyachkov
430b0a03d4
Merge pull request #9780 from id/fix-ensure-no-colon-in-filenames
...
fix: ensure no colon in filenames
2023-01-18 09:36:16 +01:00
Zaiming (Stone) Shi
faf5916ed6
test: relax recoverable/unrecoverable error check
...
for now, treat all other errors unrecoverable
2023-01-18 07:52:28 +01:00
Thales Macedo Garitezi
5c2ac0ac81
chore: don't cancel inflight items upon worker death; retry them
2023-01-17 19:50:30 -03:00
Thales Macedo Garitezi
087b667263
fix(buffer_worker): allow signalling unrecoverable errors
2023-01-17 19:50:30 -03:00
Thales Macedo Garitezi
4ed7bff33f
chore: fix dialyzer warnings
2023-01-17 16:49:16 -03:00
Thales Macedo Garitezi
fa01deb3eb
chore: retry as much as possible, don't reply to caller too soon
2023-01-17 16:49:15 -03:00
Thales Macedo Garitezi
b82009bc29
refactor: use monotonic times as refs and store initial times when creating ets
...
with this, we may measure latencies in the future.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
3ba65c4377
feat: poke the buffer workers when inflight is no longer full
...
if max inflight = 1, then we only make progress based on the state
timer, since the callbacks were not poking the buffer workers.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
b5aaef084c
refactor: enter running state directly
...
now that we don't have the possibility of dirty disk queues (we always
use volatile replayq), we will never resume old work.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
bd0e2a74ba
refactor: rename inflight_name field to inflight_tid
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
006b4bda97
feat(buffer_worker): monitor async workers and cancel their inflight requests upon death
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
731ac6567a
fix(buffer_worker): don't retry all kinds of inflight requests
...
Some requests should not be retried during the blocked state. For
example, if some async requests are just taking some time to process,
we should avoid retrying them periodically, lest risk overloading the
downstream further.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
5425f3d88e
refactor: rm unused fn
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
5dd24a64c3
refactor(buffer_worker): check if inflight is full before flushing
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
344eeebe63
fix: always ack async replies
...
The caller should decide if it should retry in that case, to avoid
overwhelming the resource with retries.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
bd95a95409
refactor: remove redundant `BlockWorker` arg, change boolean to ack/nack
...
`BlockWorker` was always false (ack). Also, changed the return to
something more semantic than a boolean to avoid [boolean
blindness](https://runtimeverification.com/blog/code-smell-boolean-blindness/ )
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
30a227bd38
refactor: rename `resume` state timeout to `unblock`
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
7401d6f0ce
refactor: rename ack fn
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
196bf1c5ba
feat: mass collect calls from mailbox also when blocked
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
d4724d6ce9
refactor: remove redundant function
...
`retry_queue` does basically what the running state does, now that we
refactored the buffer workers to always use the queue.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
d6a9d0aa48
fix: set queuing to 0 after buffer worker termination
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
81fc561ed5
fix(buffer_worker): check for overflow after enqueuing new requests
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
4cb83d0c9a
fix: fix some expressions after refactoring
2023-01-17 16:48:48 -03:00
Zaiming (Stone) Shi
fecdbac9a8
refactor: rename a few functions
2023-01-17 16:48:48 -03:00
Zaiming (Stone) Shi
cdd8de11b0
chore: fix a typo in function name
2023-01-17 16:48:48 -03:00
Zaiming (Stone) Shi
618b97870b
refactor: call local function queue_count everywhere
2023-01-17 16:48:48 -03:00
Zaiming (Stone) Shi
249c4c1c79
refactor: use 'bufs' for resource worker replayq dir
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
af6807e863
refactor: cancel flush timer sooner
...
Avoids the cancellation being delayed.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
477c55d8ef
fix: sanitizy replayq dir filepath
...
Colons (`:`) are not allowed in Windows.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
4c04a01370
refactor(buffer_worker): remove `?Q_ITEM` wrapping and use lightweight size estimate
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
32a9e60313
feat(buffer_worker): also use the inflight table for sync requests
...
Related: https://emqx.atlassian.net/browse/EMQX-8692
This should also correctly account for `retried.*` metrics for sync
requests.
Also fixes cases where race conditions for retrying async requests
could potentially lead to inconsistent metrics.
Fixes more cases where a stale reference to `replayq` was being held
accidentally after a `pop`.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
ff23d25e8b
chore(replayq): update replayq -> 0.3.6 and use `clean_start` for buffer workers
...
So we can truly avoid resuming work after a node restart.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
c383558467
fix(buffer): fix `replayq` usages in buffer workers (5.0)
...
https://emqx.atlassian.net/browse/EMQX-8700
Fixes a few errors in the usage of `replayq` queues.
- Close `replayq` when `emqx_resource_worker` terminates.
- Do not keep old references to `replayq` after any `pop`s.
- Clear `replayq`'s data directories when removing a resource.
2023-01-17 16:48:48 -03:00
Stefan Strigler
e54f2f83b3
test: use same default timeout as elsewhere
2023-01-17 15:29:19 +01:00
Ivan Dyachkov
676f017ec0
fix: ensure no colon in filenames
2023-01-16 21:27:01 +01:00
Stefan Strigler
e08c1d2229
Merge remote-tracking branch 'olcai/refactor-bridges-api' into dev/api-refactor
2023-01-13 15:49:52 +01:00
Stefan Strigler
1690a6dcfc
Merge branch 'master' into dev/api-refactor
2023-01-13 15:34:13 +01:00
Erik Timan
61e98900be
chore: bump app vsn of emqx_resource
2023-01-13 15:13:35 +01:00
Kjell Winblad
1ac03ab208
Merge pull request #9730 from kjellwinblad/kjell/fix/resource_atom_leak/EMQX-8583
...
fix: remove atom leaks
2023-01-13 14:38:28 +01:00
Kjell Winblad
734e6b9c96
chore: fix flaky test cases, log labels and review comments
...
Co-authored-by: Thales Macedo Garitezi <thalesmg@gmail.com>
2023-01-13 11:05:02 +01:00
Ivan Dyachkov
b5d3e9d8b8
fix: remove time unit from duration fields description
2023-01-12 14:18:55 +01:00
Kjell Winblad
8c482e03d1
fix: remove atom leaks
...
Both emqx_resource_managers and emqx_resource_workers leaked atoms as they
created an unique atoms to use as registered names. This is fixed by
removing the need to register the names.
Fixes: https://emqx.atlassian.net/browse/EMQX-8583
2023-01-11 17:03:28 +01:00
Stefan Strigler
8ad8288195
feat: report error in create_dry_run
2023-01-11 14:22:37 +01:00
Zaiming (Stone) Shi
85a8eff90b
fix(emqx_resource_manager): do not start when disabled
2023-01-11 08:33:48 +01:00
Thales Macedo Garitezi
70eb5ffb58
refactor: remove unused function
2023-01-05 10:16:01 -03:00
Thales Macedo Garitezi
56437228dc
docs: improve descriptions
...
Thanks to @qzhuyan for the corrections.
2023-01-05 10:16:01 -03:00
Thales Macedo Garitezi
fd360ac6c0
feat(buffer_worker): refactor buffer/resource workers to always use queue
...
This makes the buffer/resource workers always use `replayq` for
queuing, along with collecting multiple requests in a single call.
This is done to avoid long message queues for the buffer workers and
rely on `replayq`'s capabilities of offloading to disk and detecting
overflow.
Also, this deprecates the `enable_batch` and `enable_queue` resource
creation options, as: i) queuing is now always enables; ii) batch_size
> 1 <=> batch_enabled. The corresponding metric
`dropped.queue_not_enabled` is dropped, along with `batching`. The
batching is too ephemeral, especially considering a default batch time
of 20 ms, and is not shown in the dashboard, so it was removed.
2023-01-05 10:15:09 -03:00
Thales Macedo Garitezi
bf3983e7c4
feat(buffer_worker): use offload mode for `replayq`
...
To avoid confusion for the users as to what persistence guarantees we
offer when buffering bridges/resources, we will always enable offload
mode for `replayq`. With this, when the buffer size is above the max
segment size, it'll flush the queue to disk, but on recovery after a
restart it'll clean the existing segments rather than resuming from
them.
2023-01-05 10:11:59 -03:00
Erik Timan
b9d012e072
refactor(emqx_resource): ingress bridge counter
...
Unify code paths for resource metrics by removing
emqx_resource:inc_received/1 and adding
emqx_resource_metrics:received_inc/1 & friends.
2023-01-02 15:11:52 +01:00
Thales Macedo Garitezi
7e02eac3bc
Merge pull request #9619 from thalesmg/refactor-gauges-v50
...
refactor(metrics): use absolute gauge values rather than deltas (v5.0)
2023-01-02 10:56:47 -03:00
Zaiming (Stone) Shi
dbc10c2eed
chore: update copyright year 2023
2023-01-02 09:22:27 +01:00
Thales Macedo Garitezi
305ed68916
chore: bump app vsns
2022-12-30 16:51:24 -03:00
Thales Macedo Garitezi
8b060a75f1
refactor(metrics): use absolute gauge values rather than deltas
...
https://emqx.atlassian.net/browse/EMQX-8548
Currently, we face several issues trying to keep resource metrics
reasonable. For example, when a resource is re-created and has its
metrics reset, but then its durable queue resumes its previous work
and leads to strange (often negative) metrics.
Instead using `counters` that are shared by more than one worker to
manage gauges, we introduce an ETS table whose key is not only scoped
by the Resource ID as before, but also by the worker ID. This way,
when a worker starts/terminates, they should set their own gauges to
their values (often 0 or `replayq:count` when resuming off a queue).
With this scoping and initialization procedure, we'll hopefully avoid
hitting those strange metrics scenarios and have better control over
the gauges.
2022-12-30 16:51:24 -03:00
Zaiming (Stone) Shi
f93c22045d
fix: non-empty field should not be undefined
2022-12-24 11:41:45 +01:00
Zaiming (Stone) Shi
479e191dcf
refactor: refine worker pool config and doc
...
worker pool is a buffer pool
the description hinted connection pool which is wrong.
2022-12-20 09:02:51 +01:00
Zaiming (Stone) Shi
f611cbab45
chore: cap replayq seg size under total size
2022-12-19 23:16:05 +01:00
Andrew Mayorov
8a0ca38a77
fix: drop no longer supported dialyzer option
2022-12-16 13:45:05 +03:00
Zaiming (Stone) Shi
9e3da5b661
chore: bump app versions
2022-12-14 20:07:41 +01:00
Thales Macedo Garitezi
1cd91a24e9
feat(gcp_pubsub): implement GCP PubSub bridge (ee5.0)
2022-12-12 17:18:19 -03:00
Thales Macedo Garitezi
34e9056779
refactor: fix typo in variable name
...
Might confuse people to think it's related to `replayq`.
2022-12-12 17:17:51 -03:00
Thales Macedo Garitezi
62eeb4b8e8
feat(resource): reset metrics when stopping a resource
2022-10-18 09:32:35 -03:00
Thales Macedo Garitezi
2d01726b22
fix: account calls when resource is not connected as matched
2022-10-13 15:32:04 -03:00
Thales Macedo Garitezi
1b2b629cdd
feat: emit telemetry events for all resource worker metrics
2022-10-13 15:32:04 -03:00
Thales Macedo Garitezi
f0ff32c031
test: fix tests after counter changes
2022-10-11 17:45:48 -03:00
Thales Macedo Garitezi
357e5919ce
chore: add copyright disclaimer
2022-10-11 09:51:16 -03:00
Kjell Winblad
57270fb8fc
feat: add support for counters and gauges to the Kafka Bridge
...
This commit adds support for counters and gauges to the Kafka Brige.
The Kafka bridge uses [Wolff](https://github.com/kafka4beam/wolff ) for
the Kafka connection. Wolff does its own batching and does not use the
batching functionality in `emqx_resource_worker` that is used by other
bridge types. Therefore, the counter events have to be generated by
Wolff. We have added
[telemetry](https://github.com/beam-telemetry/telemetry ) events to Wolff
that we hook into to change counters and gauges for the Kafka bridge. The
counter called `matched` does not depend on specific functionality of
any bridge type so the updates of this counter is moved higher up in the
call chain then previously so that it also gets updated for Kafka
bridges.
2022-10-10 14:40:57 -03:00
Zaiming (Stone) Shi
f6ac4c3a76
Merge pull request #8798 from zmstone/0815-feat-add-kafka-connector
...
feat: Add Kafka connector
2022-09-24 22:57:50 +02:00
Shawn
b325633390
refactor(resource): resume from queue/inflight-window with async-sending and batching
2022-09-21 22:58:47 +08:00
Shawn
9aa7e826cb
refactor(resource): fast resume resource worker if inflight msgs are ACKed
2022-09-17 00:34:30 +08:00
Shawn
8307f04c2e
refactor(resource): save inflight size into the ETS table
2022-09-16 16:52:08 +08:00
Shawn
d5d3972ff5
chore: add test cases for MQTT Bridge reconnecting
2022-09-15 10:19:33 +08:00
Shawn
4e211c12d3
fix(mqtt_bridge): return value of sending messages was discarded
2022-09-15 08:57:01 +08:00
Shawn
1c03c236f5
fix(mqtt_bridge): handle send_to_remote in idle state
2022-09-14 15:19:30 +08:00
Shawn
f41adb0997
refactor: change some default values of resource_opts
2022-09-14 15:18:07 +08:00
Zaiming (Stone) Shi
0c1595be02
feat: Add Kafka connector
2022-09-13 19:46:56 +02:00
Shawn
b9ae4ea276
refactor: rename some metrics for emqx_resource
2022-09-13 14:04:25 +08:00
Shawn
2b33ca6d49
fix: no error log print if insert bool values into mysql
2022-09-07 16:00:09 +08:00
Shawn
26234d38b9
fix: mark the async msg 'queuing' not 'sent.inflight' on recoverable_error
2022-09-02 18:41:43 +08:00
Shawn
83f21b4c65
refactor(resource): remove metrics 'sent.exception'
2022-09-02 12:46:53 +08:00
Shawn
b45f3de8db
refactor(resource): rename metrics batched,queued -> batching,queuing
2022-09-02 12:41:14 +08:00
Shawn
33c9c7d497
fix: incorrect message order when batch is enabled
2022-09-01 14:51:13 +08:00
Shawn
0ef0b68de4
refactor: change '{recoverable_error,Reason}' to '{error,{recoverable_error,Reason}}'
2022-08-31 18:25:00 +08:00
Shawn
73e19d84ee
feat: use the new metrics to bridge APIs
2022-08-30 23:47:58 +08:00
Shawn
9e50866cd0
fix: rename queue_max_bytes -> max_queue_bytes
2022-08-30 17:18:54 +08:00
Shawn
c4106c0d77
fix: resume the resource worker on health check success
2022-08-30 12:28:43 +08:00
Shawn
6fde37791c
refactor: new metrics for resources
2022-08-30 10:14:10 +08:00
Shawn
1625b8eaeb
fix(mysql_bridge): export the query_mode option to the APIs
2022-08-26 17:11:24 +08:00
Shawn
6b0ccfbc43
refactor: rename the error return resource_down -> recoverable_error
2022-08-26 17:11:12 +08:00
Shawn
a896aa8b27
fix: incorrect replayq dir for the emqx_resource
2022-08-25 16:06:18 +08:00
Shawn
86577365e4
fix: use gen_statem:cast/3 for async query
2022-08-23 22:41:45 +08:00
JimMoen
f0c2b53868
fix(bpapi): make bpapi static_checks happy
2022-08-22 10:51:44 +08:00
JimMoen
62ecf6f545
fix(resource): keep `auto_retry` in `disconnected` state
...
Automatic retries should be maintained even in `disconnected` state without any state transition.
2022-08-22 02:52:06 +08:00
JimMoen
7c4ea38c06
fix(resource): make some resource opts internal
...
Resource options `start_after_created` and `start_timeout` are internal opts.
Not provided to users anymore.
2022-08-22 02:22:57 +08:00
JimMoen
06363e63d9
fix(influxdb): connector use a fallbacke `pool_size` for influxdb client
2022-08-19 15:54:19 +08:00
Shawn
9e35032d78
fix: make resume_interval defaults to health_check_interval
2022-08-16 10:09:02 +08:00
Shawn
de3a325953
fix: revert the changes in connector mysql
2022-08-16 09:06:13 +08:00
Xinyu Liu
2898966439
Merge branch 'dev/ee5.0' into resource_opts
2022-08-15 21:43:22 +08:00
Shawn
19d85d485b
refactor(resource): add resource_opts level into config structure
2022-08-15 21:40:10 +08:00
Shawn
d1de262f31
fix: inc 'actions.failed' if bridge query failed
2022-08-15 17:21:14 +08:00
Shawn
665ef4142d
fix: unify the health check interval
2022-08-15 17:21:14 +08:00
JimMoen
68946f1f6c
feat: influxdb support `async`/`batch_async` query
2022-08-15 14:02:17 +08:00
JimMoen
b01ae8ece6
chore: refine influxdb bridge/connector i18n
2022-08-15 14:00:14 +08:00
JimMoen
594d071c05
feat(influxdb): add async callback
2022-08-12 18:26:47 +08:00
JimMoen
fa5e8f1422
chore: refine i18n label
2022-08-12 16:39:03 +08:00
JimMoen
3678673124
fix: schema default value using raw type before convert
2022-08-12 16:38:46 +08:00
Shawn
0cdf4b47f1
feat: add more resource creation opts
2022-08-12 13:47:45 +08:00
Shawn
c3c4ed02b4
fix: bump emqx_dashboard to 5.0.4
2022-08-12 00:24:58 +08:00
JimMoen
3a76a50382
fix: syntax error and compile error
2022-08-11 20:58:43 +08:00
Shawn
2872f0b668
fix(bridges): support create resources with options
2022-08-11 19:11:44 +08:00
JimMoen
0f6c371760
feat(influxdb): influxdb connector add `on_batch_query/3` callback
2022-08-11 18:12:41 +08:00
JimMoen
22a4ca311c
feat(resource): resource batch/async/queue config schema
2022-08-11 16:59:18 +08:00
Shawn
6203a01320
feat: add inflight window to emqx_resource
2022-08-11 08:36:35 +08:00
Shawn
82550a585a
fix: add test cases for query async
2022-08-10 00:45:34 +08:00
Shawn
efd6c56dd9
fix: test cases for batch query sync
2022-08-10 00:45:34 +08:00
Shawn
145ff66a9a
fix: issues found by dialyzer and elvis
2022-08-10 00:45:26 +08:00
Shawn
35fe70b887
feat: support aysnc callback to connector modules
2022-08-10 00:34:35 +08:00
Shawn
f1419d52f1
fix(resource): remove resource at the end of each test
2022-08-10 00:34:35 +08:00
Shawn
a2afdeeb48
feat: add test cases for batching query
2022-08-10 00:34:35 +08:00
Shawn
75adba0781
fix: increase resource metrics using the resource id
2022-08-10 00:34:35 +08:00
Shawn
d3950b9534
fix(resource): make option 'queue_enabled' disabled by default
2022-08-10 00:34:35 +08:00
Shawn
0377d3cf61
fix: update existing testcases for new emqx_resource
2022-08-10 00:34:35 +08:00
Shawn
2fb42e4d37
refactor: create emqx_resource_worker_sup for resource workers
2022-08-10 00:34:35 +08:00
Shawn
0087b7c960
fix: remove the extra file replay.erl
2022-08-10 00:34:35 +08:00
Shawn
d8d8d674e4
feat(resource): start emqx_resource_worker in pools
2022-08-10 00:34:35 +08:00
Shawn
12904d797f
feat(resource): first commit for batching/async/caching mechanism
2022-08-10 00:34:35 +08:00
DDDHuang
98b36c4681
fix: hstream db connector , TODO: start apps
2022-07-27 11:38:45 +08:00
JianBo He
a78a389206
chore: using standard log format
2022-07-01 12:06:35 +08:00
Shawn
d6ef2f7502
refactor: graceful recreate resources
2022-06-17 05:29:18 +08:00
Shawn
cc25f92273
feat: add start_after_created option to resource:create/4
2022-06-16 23:34:52 +08:00
Zaiming (Stone) Shi
2065be569e
fix(emqx_cluster_rpc): fail fast on stale state
...
Due to:
* Cluster RPC MFA is not idempotent!
* There is a lack of rollback for callback's side-effects
For instance, when two nodes try to add a cluster-singleton
concurrently, one of them will have to wait for the table lock
then try to catch-up, then try to apply MFA.
The catch-up will have the singleton created, but the initiated
initiated multicall apply will fail causing the commit to rollback,
but not to 'undo' the singleton creation.
Later, the retries will fail indefinitely.
2022-06-12 20:18:48 +02:00
Shawn
b7f27157e5
fix: also alarm resource down when start resource failed
2022-06-01 15:41:55 +08:00
Shawn
88ca25c60c
fix(resource): fast return when starting a unavailable resource
2022-06-01 08:24:53 +08:00
Shawn
9f69e3cad6
fix(resource): discard dry_run resource down alarm
2022-06-01 08:24:53 +08:00
Shawn
d37a66e9b8
fix(test): update test cases for emqx_resource:health_check/1
2022-05-31 10:14:37 +08:00
Shawn
1054c364ad
refactor(resource): improve health check and alarm it if resource down
2022-05-31 01:40:40 +08:00
EMQ-YangM
574a40b327
fix: wait for test_resource stop
2022-05-16 17:00:42 +08:00
EMQ-YangM
b5addf7e05
fix: log all ignore events
2022-05-16 15:08:03 +08:00
EMQ-YangM
bbbfea1b5b
fix: ignore all other events
2022-05-16 15:08:03 +08:00
EMQ-YangM
1a1c82932a
fix: when connecting health check failed, update status.
2022-05-16 10:47:20 +08:00
Chris
93799e3ac6
refactor: delete now unused emqx_resource modules
2022-05-16 09:54:26 +08:00
Xinyu Liu
c4fd31ae25
Merge pull request #7916 from emqx/EMQX-4204-auto-timer-based-retry-when-in-disconnected-state
...
feat: add auto_retry for disconnected state in resource manager
2022-05-16 09:34:08 +08:00
JianBo He
3f59650e4b
Merge pull request #7944 from EMQ-YangM/fix_bridge_status
...
fix: restart resource should not clear metrics
2022-05-16 09:16:12 +08:00
Zaiming (Stone) Shi
c355c40ea8
refactor: call emqx_alarm:ensure_deactivated everywhere
2022-05-13 16:02:55 +02:00
Chris
6574c33797
feat: add auto_retry for disconnected state in resource manager
2022-05-13 11:19:39 +02:00
EMQ-YangM
d5c416736b
fix: restart resource should not clear metrics
2022-05-13 16:05:41 +08:00
JimMoen
a5ddc5390f
refactor(resource): add resource recreate fun with empty opts
2022-05-12 14:19:56 +08:00
Chris
0b3e30e813
feat: isolate resource manager processes
2022-05-09 13:24:34 +02:00
EMQ-YangM
c52b464b3c
fix: check process alive before health check
2022-04-29 17:34:26 +08:00
EMQ-YangM
1bf33f75cc
fix: set resource status disconnected
2022-04-29 17:05:12 +08:00
Zaiming (Stone) Shi
4e65322667
refactor: move emqx_plugin_libs_metrics to emqx app
...
because it can not depend on other apps
2022-04-29 12:41:36 +08:00
DDDHuang
132b37813c
refactor: code format emqx_connector emqx_resource
2022-04-28 15:32:47 +08:00
DDDHuang
667da90e52
refactor: resource instance do_create_dry_run
2022-04-28 15:32:41 +08:00
DDDHuang
2a2308bbf8
refactor: resource check & connector status
2022-04-28 15:32:35 +08:00
DDDHuang
a50980c496
fix: disconnected status in auto_reconnect = false
2022-04-28 09:47:36 +08:00
Zaiming (Stone) Shi
02c3f87b31
style: reformat all remaining apps
2022-04-27 15:51:18 +02:00
Shawn
94e24c2621
refactor: move ssl file handling from resources to bridges
2022-04-27 11:59:15 +08:00
Zaiming (Stone) Shi
f42a5b90df
Revert "feat: isolate resource manager processes"
...
This reverts commit 40cca58d4f
.
2022-04-26 16:13:38 +02:00
Chris
40cca58d4f
feat: isolate resource manager processes
2022-04-26 13:28:29 +02:00
Shawn
19630e9a99
feat: save ssl cert files for data bridges
2022-04-21 09:00:06 +08:00
Shawn
3ce969fd79
refactor: always recreate resources no matter it is connected or not
2022-04-20 11:43:05 +08:00
Shawn
278e9145b0
fix: go to different resource instance when health check
2022-04-19 23:00:34 +08:00
Ilya Averyanov
e5f04f3bf7
chore(emqx_authn_jwt): wrap JWKS connector into emqx_resourse
2022-04-18 15:47:33 +03:00
EMQ-YangM
8f06a9ec62
feat: impl resource reset_metrics
2022-04-11 10:25:48 +08:00
EMQ-YangM
9a2d70f98e
fix(emqx_resource): remove extra space
2022-03-25 18:26:18 +08:00
EMQ-YangM
6b662d87ba
fix(emqx_resource): fix dialyzer warning
2022-03-25 18:15:23 +08:00
EMQ-YangM
bb12378806
fix(emqx_resource_instance): improve the pattern match of the function call_health_check
2022-03-25 17:35:55 +08:00
zhongwencool
3414e0b601
feat(plugin): http api
2022-03-11 15:55:02 +08:00
Shawn
1d023b541f
refactor(connnector): rename waiting_connect_complete -> wait_for_resource_ready
...
Rename the option to wait_for_resource_ready and defaults to 5s.
2022-03-10 10:46:57 +08:00