Commit Graph

11596 Commits

Author SHA1 Message Date
zhongwencool a1495689c0 fix: clean self node's cluster commit when leave cluster 2024-04-09 14:13:25 +08:00
JimMoen 6cb00cc8c7
Merge pull request #12844 from JimMoen/EMQX-12012/cpu-use-idle-two-decimal-places
fix: cpu usage and idle use two decimal places
2024-04-09 10:11:15 +08:00
Andrew Mayorov d12e907209
fix(dsrepl): correctly handle ra membership change command results
Before this change, results similar to `{error, {no_more_servers_to_try,
[{error, nodedown}, {error, not_member}]}}` were considered retryable
failures, which is incorrect.
2024-04-08 22:44:34 +02:00
Andrew Mayorov 3223797ae5
fix(dsrepl): attempt leadership transfer before server removal
This should make it much less likely to hit weird edge cases that lead
to duplicate Raft log entries because of client retries upon receiving
`shutdown` from the leader being removed.
2024-04-08 22:43:58 +02:00
Andrew Mayorov 1e95bd4da6
test(dsrepl): test unresponsive nodes removal / node restarts 2024-04-08 21:27:56 +02:00
zmstone 41677eb785 refactor: make elvis happy 2024-04-08 21:25:58 +02:00
zmstone bf12efac6d fix(variform): add basic tests 2024-04-08 21:08:43 +02:00
Andrew Mayorov 7a836317ac
fix(dsrepl): trigger unfinished shard transition upon startup
Also provide a trivial API to trigger them by hand.
2024-04-08 16:12:42 +02:00
Andrew Mayorov 75bb7f5cdc
fix(dsrepl): retry only `{add, Site}` crashed membership transitions
To minimize the potential negative impact of removal transitions that
crash for some unknown and unusual reasons.
2024-04-08 16:04:33 +02:00
Kjell Winblad 9628a00a82 docs(emqx_rule_api apply rule): fix doc strings 2024-04-08 15:34:29 +02:00
Thales Macedo Garitezi ba96edb061 fix(clients api): use alternative base64 function for OTP 25
Fixes https://github.com/emqx/emqx/pull/12798#discussion_r1555524603
2024-04-08 10:23:10 -03:00
Kjell Winblad 600526a0e4 test(emqx_bridge_http_SUITE): test case after name change 2024-04-08 14:54:33 +02:00
Kjell Winblad 02ee873094 docs(emqx_rule_api_schema): fix type spec 2024-04-08 13:42:15 +02:00
Andrew Mayorov 4c0cc079c2
fix(dsrepl): apply unnecessary rebalancing transitions cleanly 2024-04-08 13:25:45 +02:00
Andrew Mayorov dcde30c38a
test(dsrepl): add two more testcases for rebalancing 2024-04-08 13:22:31 +02:00
Kjell Winblad b57725f996 test(resource_SUITE): do test case fixes needed due to rule tracing work 2024-04-08 12:41:35 +02:00
Kjell Winblad 79440064fe style: fix problems reported by elvis 2024-04-08 11:03:55 +02:00
JianBo He e11c4a9c83
Merge pull request #12826 from emqx/import-source-bridges
fix: source bridges missing after restore the backup files
2024-04-08 16:02:15 +08:00
Zaiming (Stone) Shi 3e87d4bf9f
Merge pull request #12772 from killme2008/feature/upgrade-greptimedb-ingester-ci
feat: update greptimedb client lib and ci version
2024-04-08 09:02:03 +02:00
JimMoen 282cbb18be
fix: cpu usage and idle use two decimal places
- prometheus
- opentelemetry
2024-04-08 14:14:09 +08:00
Andrew Mayorov 2ace9bb893
chore(dsrepl): sprinkle few comments and typespecs for exports 2024-04-07 22:51:56 +02:00
Andrew Mayorov ecaad348a7
chore(dsrepl): update few outdated comments / TODOs 2024-04-07 22:51:56 +02:00
Andrew Mayorov 6293efb995
fix(dsrepl): retry crashed membership transitions 2024-04-07 22:51:56 +02:00
Andrew Mayorov 826ce5806d
fix(dsrepl): ensure that new member UID matches server's UID
Before that change, UIDs supplied in the `ra:add_member/3` were not
the same as those servers were using. This haven't caused any issues
for some reason, but it's better to ensure that UIDs are the same.
2024-04-07 22:31:24 +02:00
Shawn e89dc32c90 ci: run emqx_management both with ee and ce profile 2024-04-07 18:33:52 +08:00
firest f40e47365c fix(iotdb): correctly handle undefined value of bool type 2024-04-07 17:26:15 +08:00
Shawn 1c81c79a2c chore: add testcase for importing retained msgs and sources 2024-04-07 17:24:26 +08:00
Shawn 9d1a69aaa9 fix: cannot import retained messages 2024-04-07 17:03:14 +08:00
Kjell Winblad 5479932190 feat(apply rule test): make option to stop action after render work
This commit makes the apply rule HTTP API option to stop an action work
for the HTTP action, and adds infrastructure that makes it easy to add
this functionality to other actions.
2024-04-06 17:21:12 +02:00
Kjell Winblad ef705c2285 feat: add apply rule API, clientid/ruleid tracing for rule and connector
This commit adds:

* Support for forwarding the rule id and client id to the connector so
  that events such as template rendered successfully can be traced.
* HTTP API for for applying/activating a rule with the given context
2024-04-06 17:20:47 +02:00
Thales Macedo Garitezi 04ba2aaf8a
Merge pull request #12830 from thalesmg/async-channel-hc-m-20240404
feat(resource): non-blocking channel health checks
2024-04-05 14:24:09 -03:00
Andrew Mayorov 556ffc78c9
feat(dsrepl): implement membership changes and rebalancing 2024-04-05 18:57:28 +02:00
Andrew Mayorov d6058b7f51
feat(dsrepl): allow to subscribe to DB metadata changes
Currently, only shard metadata changes are announced to the
subscribers.
2024-04-05 17:40:55 +02:00
Andrew Mayorov a07295d3bc
fix(ds): address shards in the supervisor properly 2024-04-05 17:40:38 +02:00
ieQu1 3f7b14c861
Merge pull request #12833 from ieQu1/dev/ds-cluster-api
feat(ds): Add REST API for durable storage
2024-04-05 17:33:20 +02:00
ieQu1 2504b8126b
feat(ds): Pass mgmt_ds REST API calls to the application 2024-04-05 15:22:06 +02:00
ieQu1 46261440cb
feat(ds): Add a CLI for managing DB replicas 2024-04-05 15:22:06 +02:00
ieQu1 a62db08676
feat(ds): Add REST API for durable storage 2024-04-05 15:22:06 +02:00
ieQu1 d09787d1a6
fix(ds): Fix return types in replication_layer_meta 2024-04-05 15:22:06 +02:00
Ivan Dyachkov be47fe49ad chore: bump ecql version to 0.7.0
PR: https://github.com/emqx/ecql/pull/13
No functional changes, just switch gen_fsm to gen_statem.
2024-04-05 13:31:33 +02:00
Andrew Mayorov 70396e9766
Merge pull request #12825 from keynslug/feat/EMQX-12110/repl-meta-api
feat(dsrepl): add APIs to manage DB replication sites
2024-04-04 22:32:03 +02:00
Andrew Mayorov df6c5b35fe
feat(dsrepl): add more primitive operations to modify DB sites 2024-04-04 21:22:49 +02:00
Andrew Mayorov bb8ffee18c
feat(dsrepl): add API to get current DB replication sites 2024-04-04 21:22:02 +02:00
Andrew Mayorov ad52f7838e
feat(dsrepl): add APIs to manage DB replication sites 2024-04-04 21:22:01 +02:00
Thales Macedo Garitezi 60cad74286 feat(resource): non-blocking channel health checks
Fixes https://emqx.atlassian.net/browse/EMQX-12015

Continuation of https://github.com/emqx/emqx/pull/12812
2024-04-04 16:13:30 -03:00
Thales Macedo Garitezi 8d58b40f33
Merge pull request #12831 from thalesmg/ds-checkpoint-clean-m-20240404
feat(ds): clear all checkpoints when (re)starting storage layer
2024-04-04 15:41:50 -03:00
Thales Macedo Garitezi 217b35bce5
Merge pull request #12798 from thalesmg/ds-client-api-v2-m-20240327
feat(client mgmt api): add cursor-based list API
2024-04-04 15:10:49 -03:00
Thales Macedo Garitezi c57c36adb2 feat(ds): clear all checkpoints when (re)starting storage layer
Fixes https://emqx.atlassian.net/browse/EMQX-12143
2024-04-04 14:05:52 -03:00
Kjell Winblad 59a442cdb5 feat(rule trace): add support for ruleid as a trace type 2024-04-04 14:55:32 +02:00
Thales Macedo Garitezi 069cd4fbb4
Merge pull request #12812 from thalesmg/async-res-manager-health-check-m-20240328
feat(resource manager): perform non-blocking health checks
2024-04-04 09:07:02 -03:00
zmstone 0e79b543cf refactor: move variform to emqx_utils 2024-04-04 11:10:56 +02:00
ieQu1 f37ed3a40a fix(ds): Limit the number of retries in egress to 0 2024-04-03 16:38:49 +02:00
Shawn 319ec50c0d fix: source bridges missing after restore the backup files 2024-04-03 18:26:51 +08:00
ieQu1 2bbfada7af
fix(ds): Make async batches truly async 2024-04-03 11:57:47 +02:00
ieQu1 92ca90c0ca
fix(ds): Improve egress logging 2024-04-03 11:57:47 +02:00
ieQu1 ae5935e7f7
test(ds): Attempt to stabilize metrics_worker tests in CI 2024-04-02 19:14:10 +02:00
ieQu1 4382971443
fix(ds): Preserve errors in the egress 2024-04-02 16:47:43 +02:00
ieQu1 94ca7ad0f8
feat(ds): Report counters for LTS storage layout 2024-04-02 16:47:43 +02:00
ieQu1 f14c253dea
fix(prometheus): Don't add DS metrics when feature is disabled 2024-04-02 16:47:43 +02:00
ieQu1 b379f331de
fix(sessds): Handle errors when storing messages 2024-04-02 16:47:41 +02:00
ieQu1 f41e538526
feat(sessds): Observe next time 2024-04-02 16:45:52 +02:00
ieQu1 b9ad241658
feat(sessds): Add metrics for the number of persisted messages 2024-04-02 16:45:52 +02:00
ieQu1 75b092bf0e
fix(ds): Actually retry sending batch 2024-04-02 16:45:49 +02:00
ieQu1 0de255cac8
feat(ds): Report egress flush time 2024-04-02 16:25:04 +02:00
ieQu1 044f3d4ef5
fix(ds): Don't reverse entries in the atomic batch 2024-04-02 16:25:04 +02:00
ieQu1 606f2a88cd
feat(ds): Add egress metrics 2024-04-02 16:25:04 +02:00
ieQu1 c9de336234
feat(ds): Add metrics worker to the builtin db supervision tree 2024-04-02 16:25:04 +02:00
ieQu1 d8204021dc
refactor(metrics): Move metrics worker to emqx_utils application 2024-04-02 16:25:04 +02:00
Thales Macedo Garitezi 2097e854fc feat(client mgmt api): add cursor-based list API
Fixes https://emqx.atlassian.net/browse/EMQX-12028
2024-04-02 10:55:28 -03:00
Andrew Mayorov 778e897f1f
chore(dsrepl): describe snapshot ownership and few shortcomings 2024-04-02 13:48:51 +02:00
Andrew Mayorov c666c65c6a
test(ds): factor out storage iteration into helper module 2024-04-02 13:48:51 +02:00
Andrew Mayorov 7cebf598a8
chore(dsrepl): simplify snapshot transfer code a bit
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-04-02 13:48:51 +02:00
Andrew Mayorov e029b8f996
test(dsrepl): wait for whole cluster readiness
To minimize the chance of flaky tests due to the shards not being
completely online.

Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-04-02 13:48:50 +02:00
Andrew Mayorov e8b06a6a9f
chore(dsrepl): mark few more BPAPI targets as obsolete 2024-04-02 13:48:50 +02:00
Andrew Mayorov d31cd0c728
feat(ds): ensure LTS state ids are deterministic 2024-04-02 13:48:50 +02:00
Andrew Mayorov 2cd357a5bd
fix(ds): ensure store batch is idempotent wrt generations 2024-04-02 13:48:50 +02:00
Andrew Mayorov 77a022bd93
feat(dsrepl): transfer storage snapshot during ra snapshot recovery 2024-04-02 13:48:49 +02:00
Andrew Mayorov b8b9b7739b
chore(ds): slightly simplify working with storage generations 2024-04-02 13:48:08 +02:00
Andrew Mayorov 2d074df209
Merge pull request #12797 from keynslug/fix/dsrepl-error-handling
fix(dsrepl): handle RPC errors gracefully when storage is down
2024-04-02 13:40:31 +02:00
JimMoen 5759ba5162
chore: bump app version 2024-04-02 17:09:22 +08:00
JimMoen 50bceee9ab
fix(stats): `'subscribers.count'` contains shared-subscriber 2024-04-02 16:56:40 +08:00
JimMoen 0f4b148294
refactor: uniform shared_sub table macros 2024-04-02 16:56:39 +08:00
JimMoen 1a4cfc2a2d
fix(api_schema): removed metrics schema in api spec
- Followup [PR#6622](https://github.com/emqx/emqx/pull/6622).
2024-04-02 16:56:36 +08:00
Thales Macedo Garitezi bade09b56e feat(resource manager): perform non-blocking resource health checks
Fixes https://emqx.atlassian.net/browse/EMQX-12015

This introduces only _resource_ non-blocking health checks.  _Channel_ non-blocking health
checks may be introduced later.
2024-04-01 14:46:15 -03:00
Serge Tupchii c62410ff75 refactor: remove already bound variable 2024-04-01 17:03:50 +03:00
Serge Tupchii ceb04ba06d fix(emqx_mgmt): do not attempt to get a stacktrace of a remote client connection process 2024-04-01 16:42:12 +03:00
Serge Tupchii 42af1f9d63 fix: handle internal timeout errors in client Mqueue/Inflight APIs 2024-03-29 23:03:35 +02:00
Serge Tupchii f5a820cb10 fix(emqx_mgmt): catch OOM shutdown exits properly when calling a client conn process
The exit reason is expected to include gen_server `Location`:
  `{{shutdown, OOMInfo}, Location}`.
2024-03-29 13:09:08 +02:00
zmstone bfca3ebc71 feat(variform): support array syntax '[' and ']' 2024-03-28 19:34:57 +01:00
zmstone 5f26e4ed5e feat(variform): implement variform engine 2024-03-28 19:34:57 +01:00
SergeTupchiy 2e528d1dd8
Merge pull request #12802 from SergeTupchiy/EMQX-11826-prevent-left-node-from-rejoining-5.6.1
prevent a left node from rejoining the same cluster
2024-03-28 19:49:18 +02:00
zmstone ad95473aae refactor: move string functions to emqx_variform 2024-03-28 18:03:37 +01:00
Serge Tupchii 3eda182e9a fix: prevent a node from discovering and re-joining the same cluster after it has (manually) left it. 2024-03-28 18:09:27 +02:00
zmstone 9bf65a415b feat(variform): add a variable transformer 2024-03-28 16:11:26 +01:00
Andrew Mayorov 35c43eb8a0
feat(sessds): handle recoverable errors in stream scheduler 2024-03-28 15:17:01 +01:00
Andrew Mayorov fa66a640c3
fix(dsrepl): handle RPC errors gracefully when storage is down 2024-03-28 15:17:01 +01:00
SergeTupchiy 63c017a72f
Merge pull request #12804 from SergeTupchiy/EMQX-12058-improve-force-shutdown-error-reason-5.6.1
chore: rename `message_queue_too_long` error reason to `mailbox_overflow` (5.6.1 port)
2024-03-28 16:14:51 +02:00
Thales Macedo Garitezi 8fb4ef9fe3 test: fix flaky test 2024-03-28 10:53:44 -03:00
SergeTupchiy 93c87fcb25
Merge pull request #12803 from SergeTupchiy/EMQX-11808-remove-uploaded-invalid-backups-5.6.1
fix(emqx_mgmt_data_backup): remove an uploaded backup file if it's not valid (5.6.1 port)
2024-03-28 15:37:10 +02:00
Thales Macedo Garitezi 04bf763890 fix(kafka-based bridges): avoid trying to get raw config for replayq dir
Fixes https://emqx.atlassian.net/browse/EMQX-12049
2024-03-28 09:13:34 -03:00