Zhongwen Deng
6cbad047cd
fix: don't log CONNECT twice when debug
2023-01-30 12:16:17 +08:00
Zaiming (Stone) Shi
d47941601d
refactor(buffer_worker): rename trace points
2023-01-28 11:52:11 +01:00
Zaiming (Stone) Shi
7f66c6a9e2
Merge pull request #9840 from olcai/redact-influxdb-tokens
...
fix: redact influxdb tokens in logs and reduce log level
2023-01-28 11:47:36 +01:00
Zaiming (Stone) Shi
fc38ea9571
refactor(buffer_worker): do not keep request body in reply context
...
the request body can be potentially very large
the reply context is sent to the async call handler and kept
in its memory until the async reply is received from bridge
target service.
this commit tries to minimize the size of the reply context
by replacing the request body with `[]`.
2023-01-27 17:12:55 +01:00
Ilya Averyanov
72f39b9b72
fix(docs): correct Redis conf field description
2023-01-27 17:39:16 +02:00
Zaiming (Stone) Shi
578271ea3d
refactor: use lists:map instead of lc for safty
2023-01-27 15:15:46 +01:00
Zaiming (Stone) Shi
f793807bc1
refactor(buffer_worker): rename function
...
batch_reply_after_query to handle_async_batch_reply
2023-01-27 15:04:28 +01:00
Zaiming (Stone) Shi
262c3a2869
refactor(buffer_worker): rename function
...
from reply_after_query to handle_async_reply
2023-01-27 15:03:18 +01:00
Zaiming (Stone) Shi
6a58bafcb0
chore: bump release version to e5.0.0-rc.2
2023-01-27 14:38:21 +01:00
Zaiming (Stone) Shi
52b75ada04
Merge pull request #9832 from sstrigler/EMQX-8774-failure-to-handle-timeout-error-in-resource-worker
...
EMQX 8774 failure to handle timeout error in resource worker
2023-01-27 14:36:44 +01:00
Zaiming (Stone) Shi
514609bcf7
Merge pull request #9850 from zmstone/0127-fix-influxdb-bridge-atom-leak
...
0127 fix influxdb bridge atom leak
2023-01-27 14:30:20 +01:00
Zaiming (Stone) Shi
96ed725a55
Merge pull request #9849 from zmstone/0127-refactor-buffer-worker-simplify-caller-reply
...
0127 refactor buffer worker simplify caller reply
2023-01-27 14:06:56 +01:00
Zaiming (Stone) Shi
c47be57c59
fix(bridge): ensure all bridge resources are stopped before app stop
2023-01-27 12:39:05 +01:00
Zaiming (Stone) Shi
d53106145f
fix: stop resource when resource manager terminates
2023-01-27 12:39:05 +01:00
Andrew Mayorov
d35e46b2d5
Merge pull request #9838 from keynslug/fix/redis-cluster-batching
...
feat(redis): disable batching in redis_cluster bridges
2023-01-27 15:27:57 +04:00
Stefan Strigler
7005b71ddf
style: fix typo in comment
2023-01-27 11:43:51 +01:00
Stefan Strigler
2d62de5188
test: fix expected result from timeout error
2023-01-27 11:43:48 +01:00
Stefan Strigler
a180bd9aa5
fix: catch error, not exit
2023-01-27 11:40:06 +01:00
Stefan Strigler
b7e3f9d5a6
fix: try-case-of rather than try-of
...
try-of catches only what happens within but not after
2023-01-27 11:40:06 +01:00
Zaiming (Stone) Shi
db2f631a8a
refactor(buffer_worker): simplify caller reply
2023-01-27 11:33:45 +01:00
Zaiming (Stone) Shi
965236c888
Merge pull request #9845 from zmstone/0126-reply-caller-for-buffer-overflow-queue-items
...
0126 reply caller for buffer overflow queue items
2023-01-27 11:32:59 +01:00
Zaiming (Stone) Shi
d4fab92b72
refactor(buffer_worker): no need to keep request for REPLY macro
2023-01-27 10:41:30 +01:00
Andrew Mayorov
71f996b9d5
refactor(mqtt-bridge): unwrap single statem actions
...
So that the code would be easier to follow and harder to break.
Also drop a couple of unused macrodefs.
2023-01-26 21:13:18 +03:00
Andrew Mayorov
c95f979413
fix(mqtt_bridge): use correct gen_statem reply action
2023-01-26 21:12:05 +03:00
Zaiming (Stone) Shi
1f799dfd59
fix: reply with {error, buffer_overflow} when discarded
2023-01-26 17:15:36 +01:00
Zaiming (Stone) Shi
ed28789164
refactor(buffer_worker): no need to return after collect into buf queue
2023-01-26 14:50:40 +01:00
Zaiming (Stone) Shi
25b4821adc
refactor: move the the per-message overflow log from error to info level
2023-01-26 14:48:43 +01:00
Zaiming (Stone) Shi
bb26632c8a
fix(buffer_worker): fix a wrong assertion
...
the assertion is to ensure queue items are not binary
but should not assert the queue itself
2023-01-26 14:33:16 +01:00
Zaiming (Stone) Shi
f6b3b930b0
chore: improve a error log
2023-01-26 14:21:27 +01:00
Andrew Mayorov
2ee00b75a7
fix(redis): unwrap pipeline queries against redis cluster
...
This is an additional safety measure in addition to the disabled
batching on the bridge level.
2023-01-25 17:28:11 +03:00
Erik Timan
805d08e823
fix: reduce log level from error to warning in several places
...
This reduces the log level from error to warning in places that are
connected to the influxdb bridge. Transient errors for external
resources should not render an error log.
2023-01-25 14:49:50 +01:00
Erik Timan
8836494542
fix: redact influxdb tokens in a few logs
2023-01-25 14:48:32 +01:00
Zaiming (Stone) Shi
5fdf7fd24c
fix(kafka): use async callback to bump success counters
...
some telemetry events from wolff are discarded:
* dropped:
this is double counted in wolff,
we now only subscribe to the dropped_queue_full event
* retried_failed:
it has different meanings in wolff,
in wolff, it means it's the 2nd (or onward) produce attempt
in EMQX, it means it's eventually failed after some retries
* retried_success
since we are going to handle the success counters in callbac
this having this reported from wolff will only make things
harder to understand
* failed
wolff never fails (unelss drop which is a different counter)
2023-01-24 21:12:36 +01:00
Zaiming (Stone) Shi
6175076f6f
Merge pull request #9835 from olcai/add-influxdb-test-files
...
fix: add influxdb test files and fixes
2023-01-24 17:02:23 +01:00
Zaiming (Stone) Shi
e5b65087af
Merge pull request #9834 from zmstone/0123-fix-idle_timeout-infinity
...
fix(emqx_connection): crash when idle_timeout is set to infinity
2023-01-24 16:05:09 +01:00
Erik Timan
9d20431257
fix(emqx_resource): fix crash while flushing queue
...
We used next_event for flushing the queue in emqx_resource, but this
leads to a crash. We now call flush_worker/1 instead.
2023-01-24 14:13:35 +01:00
Erik Timan
28718edbfd
chore: bump application VSNs
2023-01-24 14:12:34 +01:00
Zaiming (Stone) Shi
8fde169abb
Merge pull request #9821 from thalesmg/buffer-worker-expiry-v50
...
feat(buffer_worker): add expiration time to requests
2023-01-24 13:54:04 +01:00
Zaiming (Stone) Shi
7575120ea6
test: use snabbkaffe retry macro
2023-01-24 10:54:20 +01:00
Zaiming (Stone) Shi
140cda2f13
fix(emqx_connection): crash when idle_timeout is set to infinity
2023-01-24 10:14:35 +01:00
Zaiming (Stone) Shi
727100e094
chore: prepare for v5.0.15 release
2023-01-20 16:42:01 +01:00
Thales Macedo Garitezi
ca4a262b75
refactor: re-organize dealing with unrecoverable errors
2023-01-20 12:00:17 -03:00
Thales Macedo Garitezi
6fa6c679bb
feat(buffer_worker): add expiration time to requests
...
With this, we avoid performing work or replying to callers that are no
longer waiting on a result.
Also introduces two new counters:
- `dropped.expired` :: happens when a request expires before being
sent downstream
- `late_reply` :: when a response is receive from downstream, but the
caller is no longer for a reply because the request has expired, and
the caller might even have retried it.
2023-01-20 11:36:52 -03:00
Zaiming (Stone) Shi
57607ca0ce
chore: prepare for v5.0.15 release
2023-01-20 11:20:34 +01:00
Zaiming (Stone) Shi
1c3e055b13
Merge pull request #9822 from JimMoen/fix-schema-typo
...
chore: i18n typo fix
2023-01-20 11:11:18 +01:00
JimMoen
16f45a60fd
chore: i18n typo fix
2023-01-20 11:50:01 +08:00
Zaiming (Stone) Shi
abe7a69696
Merge remote-tracking branch 'origin/master' into release-50
2023-01-19 17:51:38 +01:00
Thales Macedo Garitezi
d755b43c77
fix(jwt_worker): handle exceptions when decoding jwk from pem
...
Returns a more controlled error if users attempt to use the Service
Account JSON from the GCP PubSub example from swagger, which is
redacted.
2023-01-19 09:24:45 -03:00
Zaiming (Stone) Shi
63748aba3c
Merge pull request #9804 from emqx/release-50
...
Merge release-50 (candidate of e5.0.0-rc.1) back to master.
2023-01-19 08:48:41 +01:00
Thales Macedo Garitezi
47f796dd12
refactor: rename `emqx_resource_worker` -> `emqx_resource_buffer_worker`
...
To make it more clear that it's purpose is serve as a buffering layer.
2023-01-18 16:15:34 -03:00
Ilya Averyanov
f9843de7ae
Merge pull request #9628 from savonarola/fix-flaky-redis-bridge-test
...
chore(ee bridge): fix Redis bridge test flakyness
2023-01-18 20:56:13 +02:00
Zaiming (Stone) Shi
1716a5da99
chore: bump version to e5.0.0-rc.1
2023-01-18 17:22:05 +01:00
Zaiming (Stone) Shi
7abba17b25
Merge pull request #9765 from zmstone/0115-add-password-converter
...
fix(schema): add password converter to ensure its binary() type
2023-01-18 15:09:05 +01:00
Andrew Mayorov
33b3c4fa9a
Merge pull request #9753 from feat/EMQX-8738/convert-ordered-sets
...
feat: turn tables queried with search APIs into ordered sets
2023-01-18 18:05:57 +04:00
Ilya Averyanov
44a6e5ed15
chore(resources): add missing parameters to emqx_resource schema
2023-01-18 14:33:45 +02:00
Zaiming (Stone) Shi
7e8381f4c7
Merge pull request #9785 from savonarola/fix-authn-handling
...
fix(authn): stop authn handling when emqx_authentication provides a result
2023-01-18 13:24:22 +01:00
ieQu1
c46d7f3404
Merge pull request #9801 from ieQu1/ekka-0.13.9
...
chore(ekka): Bump version to 0.13.9
2023-01-18 13:23:59 +01:00
Thales Macedo Garitezi
167b623497
Merge pull request #9699 from thalesmg/fix-buffer-clear-replayq-on-delete-v50
...
fix(buffer): fix `replayq` usages in buffer workers (5.0)
2023-01-18 09:08:50 -03:00
Erik Timan
46fc69cd48
Merge pull request #9781 from olcai/delete-zip-file-from-trace-log-download
...
fix(emqx_management): delete files after trace log download
2023-01-18 13:05:39 +01:00
ieQu1
d7242739e0
chore(ekka): Bump version to 0.13.9
2023-01-18 12:01:03 +01:00
Zaiming (Stone) Shi
d4f3b4c8c2
Merge remote-tracking branch 'origin/master' into fix-buffer-clear-replayq-on-delete-v50
2023-01-18 11:39:47 +01:00
zhongwencool
8e1475addb
Merge pull request #9798 from zhongwencool/dashboard-document
...
chore: improve the dashboard's configuration
2023-01-18 18:20:00 +08:00
Erik Timan
42182279b7
fix(emqx_management): ensure trace file dir is deleted on zip exception
2023-01-18 10:20:41 +01:00
Zhongwen Deng
fb84d5b817
chore: make spellcheck happy
2023-01-18 17:06:46 +08:00
Ivan Dyachkov
430b0a03d4
Merge pull request #9780 from id/fix-ensure-no-colon-in-filenames
...
fix: ensure no colon in filenames
2023-01-18 09:36:16 +01:00
Zhongwen Deng
0d852d9122
docs: improve the dashboard's document
2023-01-18 16:28:35 +08:00
Zaiming (Stone) Shi
faf5916ed6
test: relax recoverable/unrecoverable error check
...
for now, treat all other errors unrecoverable
2023-01-18 07:52:28 +01:00
zhongwencool
bc9d97ea53
Merge pull request #9791 from zhongwencool/crash-dump-doc
...
chore: more detail about crash dump config
2023-01-18 09:56:21 +08:00
Thales Macedo Garitezi
5c2ac0ac81
chore: don't cancel inflight items upon worker death; retry them
2023-01-17 19:50:30 -03:00
Thales Macedo Garitezi
087b667263
fix(buffer_worker): allow signalling unrecoverable errors
2023-01-17 19:50:30 -03:00
Stefan Strigler
f899284e3a
Merge pull request #9789 from sstrigler/EMQX-8754-test-function-return-500-of-data-integration-google-pubsub
...
EMQX 8754 test function return 500 of data integration google pubsub
2023-01-17 22:49:28 +01:00
lafirest
dea0c8230e
Merge pull request #9787 from lafirest/fix/webhook_bridge_cfg_upgrade
...
fix(bridges): fix a compatible problem for old webhook bridge config which created before the v5.0.12
2023-01-18 04:47:08 +08:00
Thales Macedo Garitezi
4ed7bff33f
chore: fix dialyzer warnings
2023-01-17 16:49:16 -03:00
Thales Macedo Garitezi
fa01deb3eb
chore: retry as much as possible, don't reply to caller too soon
2023-01-17 16:49:15 -03:00
Thales Macedo Garitezi
b82009bc29
refactor: use monotonic times as refs and store initial times when creating ets
...
with this, we may measure latencies in the future.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
3ba65c4377
feat: poke the buffer workers when inflight is no longer full
...
if max inflight = 1, then we only make progress based on the state
timer, since the callbacks were not poking the buffer workers.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
b5aaef084c
refactor: enter running state directly
...
now that we don't have the possibility of dirty disk queues (we always
use volatile replayq), we will never resume old work.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
bd0e2a74ba
refactor: rename inflight_name field to inflight_tid
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
006b4bda97
feat(buffer_worker): monitor async workers and cancel their inflight requests upon death
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
731ac6567a
fix(buffer_worker): don't retry all kinds of inflight requests
...
Some requests should not be retried during the blocked state. For
example, if some async requests are just taking some time to process,
we should avoid retrying them periodically, lest risk overloading the
downstream further.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
5425f3d88e
refactor: rm unused fn
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
5dd24a64c3
refactor(buffer_worker): check if inflight is full before flushing
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
344eeebe63
fix: always ack async replies
...
The caller should decide if it should retry in that case, to avoid
overwhelming the resource with retries.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
bd95a95409
refactor: remove redundant `BlockWorker` arg, change boolean to ack/nack
...
`BlockWorker` was always false (ack). Also, changed the return to
something more semantic than a boolean to avoid [boolean
blindness](https://runtimeverification.com/blog/code-smell-boolean-blindness/ )
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
478fcc6ffd
test: fix flaky test
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
30a227bd38
refactor: rename `resume` state timeout to `unblock`
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
7401d6f0ce
refactor: rename ack fn
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
196bf1c5ba
feat: mass collect calls from mailbox also when blocked
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
d4724d6ce9
refactor: remove redundant function
...
`retry_queue` does basically what the running state does, now that we
refactored the buffer workers to always use the queue.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
d6a9d0aa48
fix: set queuing to 0 after buffer worker termination
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
81fc561ed5
fix(buffer_worker): check for overflow after enqueuing new requests
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
4cb83d0c9a
fix: fix some expressions after refactoring
2023-01-17 16:48:48 -03:00
Zaiming (Stone) Shi
fecdbac9a8
refactor: rename a few functions
2023-01-17 16:48:48 -03:00
Zaiming (Stone) Shi
cdd8de11b0
chore: fix a typo in function name
2023-01-17 16:48:48 -03:00
Zaiming (Stone) Shi
618b97870b
refactor: call local function queue_count everywhere
2023-01-17 16:48:48 -03:00
Zaiming (Stone) Shi
249c4c1c79
refactor: use 'bufs' for resource worker replayq dir
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
af6807e863
refactor: cancel flush timer sooner
...
Avoids the cancellation being delayed.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
477c55d8ef
fix: sanitizy replayq dir filepath
...
Colons (`:`) are not allowed in Windows.
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
4c04a01370
refactor(buffer_worker): remove `?Q_ITEM` wrapping and use lightweight size estimate
2023-01-17 16:48:48 -03:00
Thales Macedo Garitezi
32a9e60313
feat(buffer_worker): also use the inflight table for sync requests
...
Related: https://emqx.atlassian.net/browse/EMQX-8692
This should also correctly account for `retried.*` metrics for sync
requests.
Also fixes cases where race conditions for retrying async requests
could potentially lead to inconsistent metrics.
Fixes more cases where a stale reference to `replayq` was being held
accidentally after a `pop`.
2023-01-17 16:48:48 -03:00