Commit Graph

20630 Commits

Author SHA1 Message Date
dependabot[bot] a4bd79593c
chore(deps): bump the actions group with 1 update
Bumps the actions group with 1 update: [actions/cache](https://github.com/actions/cache).


Updates `actions/cache` from 4.0.1 to 4.0.2
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](ab5e6d0c87...0c45773b62)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-03-25 03:40:38 +00:00
SergeTupchiy 8e979d511c
Merge pull request #12766 from SergeTupchiy/EMQX-12058-improve-force-shutdown-error-reason
chore: rename `message_queue_too_long` to `mailbox_overflow`
2024-03-22 14:32:06 +02:00
Serge Tupchii d2a1a7f7cf chore: rename `message_queue_too_long` error reason to `mailbox_overflow`
`mailbox_overflow` is consistent with the corresponding config parameter:
 'force_shutdown.max_mailbox_size'
2024-03-22 12:20:20 +02:00
Thales Macedo Garitezi 23ad37f566
Merge pull request #12762 from thalesmg/ds-fix-sync-egress-reply-m-20240321
fix(ds): add caller to pending replies before flushing and reply failures to sync callers
2024-03-21 16:42:42 -03:00
Thales Macedo Garitezi cfd4a1297f
Merge pull request #12761 from thalesmg/test-fix-flaky-dash-mon-m-20240321
test: fix flaky test
2024-03-21 16:04:08 -03:00
Thales Macedo Garitezi e17f663fa5
Merge pull request #12749 from thalesmg/mv-followup1-m-20240320
Follow up features and fixes for message validation - part 1
2024-03-21 16:03:49 -03:00
Thales Macedo Garitezi 796c04e7a8 test: fix flaky test
We should emit the trace event before replying to callers.

Example failure:

https://github.com/emqx/emqx/actions/runs/8378977952/job/22946318696#step:6:182

```
 =CRITICAL REPORT==== 21-Mar-2024::17:45:37.676024 ===
"check stage" failed: error
{assertMatch,[{module,emqx_ds_storage_bitfield_lts_SUITE},
              {line,270},
              {expression,"? of_kind ( emqx_ds_replication_layer_egress_flush , Trace )"},
              {pattern,"[ # { batch := [ _ , _ , _ ] } ]"},
              {value,[]}]}
Stacktrace: [{emqx_ds_storage_bitfield_lts_SUITE,
                 '-t_atomic_store_batch/1-fun-1-',1,
                 [{file,
                      "/__w/emqx/emqx/apps/emqx_durable_storage/test/emqx_ds_storage_bitfield_lts_SUITE.erl"},
                  {line,270}]},
             {emqx_ds_storage_bitfield_lts_SUITE,t_atomic_store_batch,1,
                 [{file,
                      "/__w/emqx/emqx/apps/emqx_durable_storage/test/emqx_ds_storage_bitfield_lts_SUITE.erl"},
                  {line,249}]}]
```
2024-03-21 15:47:29 -03:00
Thales Macedo Garitezi 68af211130 fix(ds): reply sync callers after raft store failure 2024-03-21 15:40:21 -03:00
Thales Macedo Garitezi 70737a437a fix(ds): add caller to pending replies before flushing 2024-03-21 14:39:21 -03:00
Thales Macedo Garitezi fa7ec231e3 test: fix flaky test 2024-03-21 14:07:09 -03:00
SergeTupchiy e2a2295c99
Merge pull request #12759 from SergeTupchiy/EMQX-11808-remove-uploaded-invalid-backups
fix(emqx_mgmt_data_backup): remove invalid uploaded backup files
2024-03-21 18:58:48 +02:00
Thales Macedo Garitezi 00aa7b5621 feat: create new `message.validation_failed` hookpoint and rule engine event 2024-03-21 13:46:27 -03:00
Thales Macedo Garitezi 7d7c6685d4
Merge pull request #12753 from thalesmg/test-flaky-retry-m-20240321
test: attempt to stabilize flaky test
2024-03-21 13:43:51 -03:00
Thales Macedo Garitezi 8ffd1304a5
Merge pull request #12755 from thalesmg/fix-kafka-probe-buffer-disk-m-20240321
fix(kafka-based bridges): avoid trying to get raw config for replayq dir
2024-03-21 13:39:33 -03:00
Serge Tupchii 40eccb10d6 fix(emqx_mgmt_data_backup): remove an uploaded backup file if it's not valid 2024-03-21 17:36:39 +02:00
Thales Macedo Garitezi e837791f94 fix(kafka-based bridges): avoid trying to get raw config for replayq dir
Fixes https://emqx.atlassian.net/browse/EMQX-12049
2024-03-21 11:49:00 -03:00
ieQu1 34c3c4e892
Merge pull request #12474 from ieQu1/dev/ds-rest-api
feat(sessds): Expose relevant durable session info in the REST API
2024-03-21 15:28:07 +01:00
Thales Macedo Garitezi 64399b6861 test: attempt to stabilize flaky test 2024-03-21 11:04:14 -03:00
ieQu1 cada944350
feat(sessds): Expose relevant durable session info in the REST API 2024-03-21 10:37:04 +01:00
Andrew Mayorov e10d43cdce
Merge pull request #12361 from keynslug/ft/EMQX-11756/emqx-ds-replication
feat(ds): implement raft-based replication
2024-03-20 21:32:42 +01:00
Thales Macedo Garitezi 62030e8942 feat(message validation): forbid repeated schema checks
Fixes https://emqx.atlassian.net/browse/EMQX-12054
2024-03-20 15:26:16 -03:00
Thales Macedo Garitezi b8cd1c9020 feat(message validation api): add enable/disable HTTP API 2024-03-20 14:59:32 -03:00
Thales Macedo Garitezi 4944cc080e feat(message_validation): add `ignore` failure action 2024-03-20 14:31:43 -03:00
Thales Macedo Garitezi 8753e3583f feat(message_validation): add `none` log level 2024-03-20 14:31:26 -03:00
Thales Macedo Garitezi e767f01e0a fix(message_validation): take `enable` into account 2024-03-20 13:57:12 -03:00
Thales Macedo Garitezi 7aa287c6c1 fix: add message validation schema to `emqx_enterprise_schema` 2024-03-20 13:37:01 -03:00
Andrew Mayorov 7257fe526b
fix(ci): add `ra` to emqx app dependencies as well 2024-03-20 14:46:53 +01:00
Andrew Mayorov e55e1dd1b2
chore: whitelist `ra` to make RPCs w/o BPAPIs 2024-03-20 13:20:25 +01:00
Andrew Mayorov a8baff61ec
docs(dsrepl): describe briefly what `n_sites` is for 2024-03-20 13:20:25 +01:00
Andrew Mayorov efac5c6197
test(ds): stabilize `t_message_gc` testcase 2024-03-20 13:20:25 +01:00
Andrew Mayorov fe50a1711b
fix(ds-egress): drop pending batch on failures
Before this commit, messages in the current batch will be retried as
part of next batch. This could have led to message duplication which is
probably not what the user wants by default.
2024-03-20 13:20:25 +01:00
Andrew Mayorov a1f5de3f5b
fix(dsrepl): turn memoize into a safer function 2024-03-20 13:20:24 +01:00
Andrew Mayorov d39ca41070
chore(dsrepl): mark per-node `add_generation` RPC target obsolete
Also annotate internal exports with comments according with their
intended use.
2024-03-20 13:20:24 +01:00
Andrew Mayorov 35b18f9125
fix(dsrepl): properly handle error conditions in generation mgmt
Also update few outdated typespecs. Also make error reasons easier
to comprehend.
2024-03-20 13:20:24 +01:00
Andrew Mayorov f2268aa69a
fix(dsrepl): use correct base dir for ra system stuff
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-03-20 13:20:24 +01:00
Andrew Mayorov 404e919494
refactor(dsrepl): make shard allocator more robust and consistent
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-03-20 13:20:24 +01:00
Andrew Mayorov 0e18bd6e80
fix(dsrepl): increase replication site id bitsize back
In order to minimize chances of site id collision to practically zero.
2024-03-20 13:20:24 +01:00
Andrew Mayorov 46d9adb926
fix(build): sync mix dependencies 2024-03-20 13:20:24 +01:00
Andrew Mayorov ac9700dd28
fix(dsrepl): split shard allocator into a separate module 2024-03-20 13:20:23 +01:00
Andrew Mayorov 1b647035d0
chore(dsrepl): make dialyzer a bit happier 2024-03-20 13:20:23 +01:00
Andrew Mayorov 611b3f0e07
feat(dsrepl): use more straightforward way to drop ra shards 2024-03-20 13:20:23 +01:00
Andrew Mayorov 74881e8706
feat(dsrepl): make storage layer unaware of granularity of time
Storage also becomes a bit more pure, depending on the upper layer to
provide the timestamps, which also makes it possible to handle more
operations idempotently.
2024-03-20 13:20:23 +01:00
Andrew Mayorov 3cb36a5619
feat(ds-lts): extract timestamp from storage key itself
1. This avoids the need to deserialize the message to get the timestamp.
2. It also makes possible to decouple the storage key timestamp from the
   message timestamp, which might be useful for replication purposes.
2024-03-19 20:21:56 +01:00
Andrew Mayorov 5cc0246351
feat(dsrepl): allow to tune select ra options 2024-03-19 20:21:55 +01:00
Andrew Mayorov 54b5adf868
feat(dsrepl): allocate shards predictably
To ensure strictly optimal and fair shard allocation across
cluster. Before this commit it was quite easy to end up with
an allocation significantly skewed towards some node, because
of the nature of randomness and relatively small number of
shards.
2024-03-19 20:21:55 +01:00
Andrew Mayorov d30c99512a
feat(utils-stream): add a few more stream combinators 2024-03-19 20:21:54 +01:00
Andrew Mayorov 887e151be5
fix(dsrepl): handle errors gracefully in shard egress process
Also add cooldown on timeout / unavailability.
2024-03-19 20:21:53 +01:00
Andrew Mayorov e16aee99b4
chore(dsrepl): clarify how to perform leadership transfer in runtime 2024-03-19 20:21:18 +01:00
Ivan Dyachkov 749ad73819
Merge pull request #12721 from emqx/dependabot/github_actions/actions-d8a5788e1a
chore(deps): bump the actions group with 3 updates
2024-03-19 20:16:03 +01:00
Andrew Mayorov 00d509f27b
feat(dsrepl): prefer local replica in read path
To optimize out any unnecessary RPCs. Given the load should be
smoothed evenly across the cluster, choosing non-leader node is
not a priority.
2024-03-19 20:11:42 +01:00