Andrew Mayorov
3223797ae5
fix(dsrepl): attempt leadership transfer before server removal
...
This should make it much less likely to hit weird edge cases that lead
to duplicate Raft log entries because of client retries upon receiving
`shutdown` from the leader being removed.
2024-04-08 22:43:58 +02:00
Andrew Mayorov
1e95bd4da6
test(dsrepl): test unresponsive nodes removal / node restarts
2024-04-08 21:27:56 +02:00
Andrew Mayorov
7a836317ac
fix(dsrepl): trigger unfinished shard transition upon startup
...
Also provide a trivial API to trigger them by hand.
2024-04-08 16:12:42 +02:00
Andrew Mayorov
75bb7f5cdc
fix(dsrepl): retry only `{add, Site}` crashed membership transitions
...
To minimize the potential negative impact of removal transitions that
crash for some unknown and unusual reasons.
2024-04-08 16:04:33 +02:00
Andrew Mayorov
4c0cc079c2
fix(dsrepl): apply unnecessary rebalancing transitions cleanly
2024-04-08 13:25:45 +02:00
Andrew Mayorov
dcde30c38a
test(dsrepl): add two more testcases for rebalancing
2024-04-08 13:22:31 +02:00
Andrew Mayorov
2ace9bb893
chore(dsrepl): sprinkle few comments and typespecs for exports
2024-04-07 22:51:56 +02:00
Andrew Mayorov
ecaad348a7
chore(dsrepl): update few outdated comments / TODOs
2024-04-07 22:51:56 +02:00
Andrew Mayorov
6293efb995
fix(dsrepl): retry crashed membership transitions
2024-04-07 22:51:56 +02:00
Andrew Mayorov
826ce5806d
fix(dsrepl): ensure that new member UID matches server's UID
...
Before that change, UIDs supplied in the `ra:add_member/3` were not
the same as those servers were using. This haven't caused any issues
for some reason, but it's better to ensure that UIDs are the same.
2024-04-07 22:31:24 +02:00
Andrew Mayorov
556ffc78c9
feat(dsrepl): implement membership changes and rebalancing
2024-04-05 18:57:28 +02:00
Andrew Mayorov
d6058b7f51
feat(dsrepl): allow to subscribe to DB metadata changes
...
Currently, only shard metadata changes are announced to the
subscribers.
2024-04-05 17:40:55 +02:00
Andrew Mayorov
a07295d3bc
fix(ds): address shards in the supervisor properly
2024-04-05 17:40:38 +02:00
ieQu1
a62db08676
feat(ds): Add REST API for durable storage
2024-04-05 15:22:06 +02:00
ieQu1
d09787d1a6
fix(ds): Fix return types in replication_layer_meta
2024-04-05 15:22:06 +02:00
Andrew Mayorov
70396e9766
Merge pull request #12825 from keynslug/feat/EMQX-12110/repl-meta-api
...
feat(dsrepl): add APIs to manage DB replication sites
2024-04-04 22:32:03 +02:00
Andrew Mayorov
df6c5b35fe
feat(dsrepl): add more primitive operations to modify DB sites
2024-04-04 21:22:49 +02:00
Andrew Mayorov
bb8ffee18c
feat(dsrepl): add API to get current DB replication sites
2024-04-04 21:22:02 +02:00
Andrew Mayorov
ad52f7838e
feat(dsrepl): add APIs to manage DB replication sites
2024-04-04 21:22:01 +02:00
Thales Macedo Garitezi
c57c36adb2
feat(ds): clear all checkpoints when (re)starting storage layer
...
Fixes https://emqx.atlassian.net/browse/EMQX-12143
2024-04-04 14:05:52 -03:00
ieQu1
f37ed3a40a
fix(ds): Limit the number of retries in egress to 0
2024-04-03 16:38:49 +02:00
ieQu1
2bbfada7af
fix(ds): Make async batches truly async
2024-04-03 11:57:47 +02:00
ieQu1
92ca90c0ca
fix(ds): Improve egress logging
2024-04-03 11:57:47 +02:00
ieQu1
ae5935e7f7
test(ds): Attempt to stabilize metrics_worker tests in CI
2024-04-02 19:14:10 +02:00
ieQu1
4382971443
fix(ds): Preserve errors in the egress
2024-04-02 16:47:43 +02:00
ieQu1
94ca7ad0f8
feat(ds): Report counters for LTS storage layout
2024-04-02 16:47:43 +02:00
ieQu1
b379f331de
fix(sessds): Handle errors when storing messages
2024-04-02 16:47:41 +02:00
ieQu1
f41e538526
feat(sessds): Observe next time
2024-04-02 16:45:52 +02:00
ieQu1
75b092bf0e
fix(ds): Actually retry sending batch
2024-04-02 16:45:49 +02:00
ieQu1
0de255cac8
feat(ds): Report egress flush time
2024-04-02 16:25:04 +02:00
ieQu1
044f3d4ef5
fix(ds): Don't reverse entries in the atomic batch
2024-04-02 16:25:04 +02:00
ieQu1
606f2a88cd
feat(ds): Add egress metrics
2024-04-02 16:25:04 +02:00
ieQu1
c9de336234
feat(ds): Add metrics worker to the builtin db supervision tree
2024-04-02 16:25:04 +02:00
Andrew Mayorov
778e897f1f
chore(dsrepl): describe snapshot ownership and few shortcomings
2024-04-02 13:48:51 +02:00
Andrew Mayorov
7cebf598a8
chore(dsrepl): simplify snapshot transfer code a bit
...
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-04-02 13:48:51 +02:00
Andrew Mayorov
e8b06a6a9f
chore(dsrepl): mark few more BPAPI targets as obsolete
2024-04-02 13:48:50 +02:00
Andrew Mayorov
d31cd0c728
feat(ds): ensure LTS state ids are deterministic
2024-04-02 13:48:50 +02:00
Andrew Mayorov
2cd357a5bd
fix(ds): ensure store batch is idempotent wrt generations
2024-04-02 13:48:50 +02:00
Andrew Mayorov
77a022bd93
feat(dsrepl): transfer storage snapshot during ra snapshot recovery
2024-04-02 13:48:49 +02:00
Andrew Mayorov
b8b9b7739b
chore(ds): slightly simplify working with storage generations
2024-04-02 13:48:08 +02:00
Andrew Mayorov
fa66a640c3
fix(dsrepl): handle RPC errors gracefully when storage is down
2024-03-28 15:17:01 +01:00
Ivan Dyachkov
db9efb9317
chore: bump apps versions
2024-03-28 10:19:09 +01:00
Thales Macedo Garitezi
796c04e7a8
test: fix flaky test
...
We should emit the trace event before replying to callers.
Example failure:
https://github.com/emqx/emqx/actions/runs/8378977952/job/22946318696#step:6:182
```
=CRITICAL REPORT==== 21-Mar-2024::17:45:37.676024 ===
"check stage" failed: error
{assertMatch,[{module,emqx_ds_storage_bitfield_lts_SUITE},
{line,270},
{expression,"? of_kind ( emqx_ds_replication_layer_egress_flush , Trace )"},
{pattern,"[ # { batch := [ _ , _ , _ ] } ]"},
{value,[]}]}
Stacktrace: [{emqx_ds_storage_bitfield_lts_SUITE,
'-t_atomic_store_batch/1-fun-1-',1,
[{file,
"/__w/emqx/emqx/apps/emqx_durable_storage/test/emqx_ds_storage_bitfield_lts_SUITE.erl"},
{line,270}]},
{emqx_ds_storage_bitfield_lts_SUITE,t_atomic_store_batch,1,
[{file,
"/__w/emqx/emqx/apps/emqx_durable_storage/test/emqx_ds_storage_bitfield_lts_SUITE.erl"},
{line,249}]}]
```
2024-03-21 15:47:29 -03:00
Thales Macedo Garitezi
68af211130
fix(ds): reply sync callers after raft store failure
2024-03-21 15:40:21 -03:00
Thales Macedo Garitezi
70737a437a
fix(ds): add caller to pending replies before flushing
2024-03-21 14:39:21 -03:00
Andrew Mayorov
fe50a1711b
fix(ds-egress): drop pending batch on failures
...
Before this commit, messages in the current batch will be retried as
part of next batch. This could have led to message duplication which is
probably not what the user wants by default.
2024-03-20 13:20:25 +01:00
Andrew Mayorov
a1f5de3f5b
fix(dsrepl): turn memoize into a safer function
2024-03-20 13:20:24 +01:00
Andrew Mayorov
d39ca41070
chore(dsrepl): mark per-node `add_generation` RPC target obsolete
...
Also annotate internal exports with comments according with their
intended use.
2024-03-20 13:20:24 +01:00
Andrew Mayorov
35b18f9125
fix(dsrepl): properly handle error conditions in generation mgmt
...
Also update few outdated typespecs. Also make error reasons easier
to comprehend.
2024-03-20 13:20:24 +01:00
Andrew Mayorov
f2268aa69a
fix(dsrepl): use correct base dir for ra system stuff
...
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-03-20 13:20:24 +01:00
Andrew Mayorov
404e919494
refactor(dsrepl): make shard allocator more robust and consistent
...
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-03-20 13:20:24 +01:00
Andrew Mayorov
0e18bd6e80
fix(dsrepl): increase replication site id bitsize back
...
In order to minimize chances of site id collision to practically zero.
2024-03-20 13:20:24 +01:00
Andrew Mayorov
ac9700dd28
fix(dsrepl): split shard allocator into a separate module
2024-03-20 13:20:23 +01:00
Andrew Mayorov
1b647035d0
chore(dsrepl): make dialyzer a bit happier
2024-03-20 13:20:23 +01:00
Andrew Mayorov
611b3f0e07
feat(dsrepl): use more straightforward way to drop ra shards
2024-03-20 13:20:23 +01:00
Andrew Mayorov
74881e8706
feat(dsrepl): make storage layer unaware of granularity of time
...
Storage also becomes a bit more pure, depending on the upper layer to
provide the timestamps, which also makes it possible to handle more
operations idempotently.
2024-03-20 13:20:23 +01:00
Andrew Mayorov
3cb36a5619
feat(ds-lts): extract timestamp from storage key itself
...
1. This avoids the need to deserialize the message to get the timestamp.
2. It also makes possible to decouple the storage key timestamp from the
message timestamp, which might be useful for replication purposes.
2024-03-19 20:21:56 +01:00
Andrew Mayorov
5cc0246351
feat(dsrepl): allow to tune select ra options
2024-03-19 20:21:55 +01:00
Andrew Mayorov
54b5adf868
feat(dsrepl): allocate shards predictably
...
To ensure strictly optimal and fair shard allocation across
cluster. Before this commit it was quite easy to end up with
an allocation significantly skewed towards some node, because
of the nature of randomness and relatively small number of
shards.
2024-03-19 20:21:55 +01:00
Andrew Mayorov
887e151be5
fix(dsrepl): handle errors gracefully in shard egress process
...
Also add cooldown on timeout / unavailability.
2024-03-19 20:21:53 +01:00
Andrew Mayorov
e16aee99b4
chore(dsrepl): clarify how to perform leadership transfer in runtime
2024-03-19 20:21:18 +01:00
Andrew Mayorov
00d509f27b
feat(dsrepl): prefer local replica in read path
...
To optimize out any unnecessary RPCs. Given the load should be
smoothed evenly across the cluster, choosing non-leader node is
not a priority.
2024-03-19 20:11:42 +01:00
Andrew Mayorov
19305c223c
fix(dsrepl): require majority for replication-related tables
2024-03-19 20:11:42 +01:00
Andrew Mayorov
f89909f60c
fix(dsrepl): tolerate trigger election timeouts for existing servers
2024-03-19 20:11:42 +01:00
Andrew Mayorov
3b59cf2ebf
feat(dsrepl): move shard allocation to separate process
...
That starts shard and egress processes only when shards are fully
allocated.
2024-03-19 20:11:41 +01:00
Andrew Mayorov
4dafbf21f6
fix(dsrepl): make db + shard part of machine state
...
It doesn't feel right, but right now is the easiest way to have it
in the scope of `apply/3`, because `init/1` doesn't actually invoked
for ra machines recovered from the existing log / snapshot.
2024-03-19 20:11:41 +01:00
Andrew Mayorov
d19128ed65
feat(dsrepl): cache shard metadata in persistent terms
2024-03-19 20:11:41 +01:00
Andrew Mayorov
e6c2c2fb07
feat(dsrepl): manage generations / db config through ra machine
2024-03-19 20:11:39 +01:00
Andrew Mayorov
5e94bdb932
feat(dsrepl): allocate shards once predefined number of sites online
...
Before this commit the most likely shard allocation outcome was
that all shard are allocated to just one node.
2024-03-19 20:11:03 +01:00
Andrew Mayorov
be793e4735
fix(dsrepl): reassign timestamp at the time of submission
...
This is needed to ensure total message order for a shard, and
guarantee that no messages will be written "in the past". which
may break replay consistency.
2024-03-19 20:11:01 +01:00
Andrew Mayorov
146f082fdc
feat(dsrepl): implement raft-based replication
...
Still very rough but mostly working.
2024-03-19 20:09:44 +01:00
Ivan Dyachkov
f2dc940436
Merge remote-tracking branch 'upstream/release-56' into 0319-sync-release56
2024-03-19 15:20:08 +01:00
Thales Macedo Garitezi
2ebc8dcc55
fix(ds): use `infinity` timeout when storing batches
2024-03-14 10:17:18 -03:00
Thales Macedo Garitezi
6af01b916e
feat(ds): implement `get_delete_streams`, `make_delete_iterator` and `delete_next` callbacks for builtin storage
...
Part of https://emqx.atlassian.net/browse/EMQX-11841
2024-03-08 09:56:46 -03:00
Andrew Mayorov
09905d78cd
chore(ds): make error handling slightly simpler
...
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-03-07 12:59:57 +01:00
Andrew Mayorov
b39c710ec2
fix(ds): tidy up few typespecs
2024-03-07 12:59:57 +01:00
Andrew Mayorov
2146d9e1fe
feat(ds): introduce error classes in critical API functions
...
For now, only recoverable / unrecoverable errors are introduced.
2024-03-07 12:59:57 +01:00
Thales Macedo Garitezi
5d87d400f4
feat(ds): add atomic store API
...
Part of https://emqx.atlassian.net/browse/EMQX-11841
2024-03-06 15:24:14 -03:00
Thales Macedo Garitezi
06334798a5
fix(ds): fix `drop_generation` typespec
...
This typespec fix will be used downstream by other backends.
2024-03-04 14:15:59 -03:00
Ilya Averyanov
b706caf294
feat(ds): export types
2024-02-29 14:27:18 +03:00
Ilya Averyanov
d5ae0e5c53
feat(ds): update delete/count interface
2024-02-28 22:51:24 +03:00
Ilya Averyanov
b010d34640
chore(ds): add delete callbacks
2024-02-26 17:35:13 +03:00
Zaiming (Stone) Shi
46877e979b
chore: update copyright-year
2024-02-23 08:21:06 +01:00
Thales Macedo Garitezi
d469f4158e
chore: bump app vsns
2024-02-20 16:53:57 -03:00
ieQu1
8cfb22f0b8
fix(ds): Retry getting the shard leader
2024-02-16 12:42:48 +01:00
ieQu1
280fcd8c52
Merge pull request #12437 from ieQu1/dev/optimize_make_filter
...
Optimize emqx_ds_bitmask_keymapper:make_filter function.
2024-02-05 17:32:28 +01:00
ieQu1
4665837cf0
fix(ds): Apply review remarks
2024-02-05 16:52:06 +01:00
ieQu1
c7888ad1f1
Merge pull request #12475 from ieQu1/dev/lean-stream
...
Use a more compact data structure to represent streams
2024-02-05 13:55:24 +01:00
ieQu1
698ba3f271
fix(ds): Optimize emqx_ds_bitmask_keymapper:make_filter
...
This optimization makes idle polling faster
2024-02-05 10:54:19 +01:00
ieQu1
8edbec5929
refactor(ds): Clarify the language used in ds_bitmapper
2024-02-05 10:54:18 +01:00
Zaiming (Stone) Shi
75023f2ca3
Merge pull request #12442 from ieQu1/dev/ds-license-apache
...
Relicense apps/emqx_durable_storage under Apache 2.0
2024-02-05 10:16:52 +01:00
ieQu1
2e56810ea2
refactor(ds): Use a simple improper list to represent the streams
2024-02-03 21:15:54 +01:00
ieQu1
3d2ac97c61
feat(ds): Add a CLI interface to inspect status of DS databases
2024-02-02 10:11:01 +01:00
ieQu1
b50d6bf1fd
chore(ds): Change the license to Apache 2.0
...
Due to technicalities parts of the original code were licensed under
BSL.
In preparations for the public release of the feature, the license has
been changed to Apache 2.0
2024-02-01 00:10:48 +01:00
Thales Macedo Garitezi
d51deac222
fix(ds): use configured data dir for site storage
2024-01-31 15:42:26 -03:00
ieQu1
2479e1189a
fix(ds): Remove unused module
2024-01-29 00:36:13 +01:00
ieQu1
eec56b0d6b
fix(sessds): Improve comments
2024-01-26 17:49:33 +01:00
ieQu1
8e8d3af096
fix(sessds): Refactor emqx_persistent_session_ds to use CRUD module
2024-01-26 17:49:33 +01:00
Thales Macedo Garitezi
8e31afe6c2
fix(ds): don't make data dir part of the schema
...
The data directory was ending up being persisted in the database schema. This led to
issues when opening the DB on different nodes.
2024-01-25 14:44:06 -03:00
ieQu1
305a54f646
chore(ds): Update BPAPI version
2024-01-24 19:33:30 +01:00