Commit Graph

293 Commits

Author SHA1 Message Date
ieQu1 66ec2e6ad0
fix(dsrepl): Retry sending ra commands to the leader 2024-05-24 19:08:05 +02:00
ieQu1 53620e8439
fix(dsrepl): Treat all exceptions from storage layer as recoverable 2024-05-24 19:08:05 +02:00
ieQu1 a7259bc35d
fix(dsstor): Delete generation metadata before dropping it 2024-05-24 19:08:05 +02:00
ieQu1 0d234d6f37
test(ds): Add emqx_ds_replication_SUITE:t_drop_generation 2024-05-24 19:08:05 +02:00
Andrew Mayorov ba6382adae
fix(dsrepl): trigger "last-resort" pending transitions handler when idle
This is a hack to work around the unintended issues causing shard
allocator to become idle even when there are pending transitions.
2024-05-23 14:54:01 +02:00
Andrew Mayorov bf326acd7b
fix(dsrepl): handle stopping non-yet-running shard supervisor 2024-05-23 14:34:27 +02:00
Andrew Mayorov 398dc97ed6
Merge pull request #13092 from keynslug/fix/dsrepl/site-autoleave
fix(dsrepl): properly handle transaction abort during forget site
2024-05-23 10:56:23 +03:00
ieQu1 6eb04f90a3
fix(ds): Allow to write batches to older generations 2024-05-22 20:28:16 +02:00
ieQu1 59a09fb86f
fix(ds): Apply review remarks 2024-05-22 18:01:33 +02:00
ieQu1 29345aaa30
fix(ds): Fix idle event generation in bitfield_lts layout 2024-05-22 18:01:33 +02:00
ieQu1 b3ded7edce
fix(ds): Fix code review remark 2024-05-22 18:01:33 +02:00
ieQu1 60edf5e9b8
fix(ds): Move responsibility of returning end_of_stream to the CBM 2024-05-22 18:01:33 +02:00
ieQu1 0ff307e789
fix(ds): Include generation ID in the storage events
Make sure storage events originating from generation X are handled in
the context of the same generation.
2024-05-22 18:01:33 +02:00
ieQu1 acdae4fba3
fix(ds): Workaround for the idempotency error when dropping gens 2024-05-22 18:01:33 +02:00
ieQu1 eb7c43ee9d
fix(ds): Always store messages in the current generation 2024-05-22 18:01:33 +02:00
ieQu1 074d98a14a
test(ds): Refactor ds_SUITE 2024-05-22 18:01:33 +02:00
ieQu1 e4a73f003a
feat(ds): Implement format_status callback
Reduce volume of logs and crash reports from DS
2024-05-22 18:01:32 +02:00
ieQu1 1526c527d0
fix(ds): Log generation operations 2024-05-22 18:01:32 +02:00
ieQu1 aca2d9586c
fix(ds): Fix return type of drop_generation 2024-05-22 18:01:32 +02:00
ieQu1 c6fc76e335
fix(ds): Perform read operations on the leader. 2024-05-22 18:01:32 +02:00
ieQu1 4580906405
fix(ds): Use erpc instead of gen_rpc for `delete_next' 2024-05-22 18:01:32 +02:00
Andrew Mayorov e6c5c1b598
chore(dsrepl): provide more information in rebalancing log messages 2024-05-22 17:24:08 +02:00
Andrew Mayorov c355c9ad50
fix(dsrepl): properly handle transaction abort during forget site 2024-05-22 17:22:55 +02:00
Andrew Mayorov 86f99959b0
Merge pull request #13054 from keynslug/fix/EMQX-12365/node-leave
fix(dsrepl): anticipate and handle nodes leaving the cluster
2024-05-17 09:43:15 +02:00
ieQu1 ee6e7174cf
fix(sessds): Rename the durable messages DB to `messages` 2024-05-16 21:31:32 +02:00
Andrew Mayorov 5157e61418
fix(dsrepl): verify if shards already allocated first 2024-05-16 18:56:54 +02:00
Andrew Mayorov 0119728d45
feat(dsrepl): also reflect pending transitions in ds status 2024-05-16 18:56:21 +02:00
Andrew Mayorov 26c4a4f597
feat(dsrepl): reflect conflicts and inconsistencies in ds status 2024-05-16 18:32:08 +02:00
Andrew Mayorov 7e86e3e61c
fix(dsrepl): anticipate and handle nodes leaving the cluster
Also make `claim_site/2` safer by refusing to claim a site for a node
that is already there.
2024-05-16 18:32:07 +02:00
Andrew Mayorov 35e360fcbe
feat(api-ds): provide more information on nonexistent site leave 2024-05-14 15:57:41 +02:00
ieQu1 525e4dac95
Merge pull request #13036 from ieQu1/dev/reduce-log-spam
tests: Reduce log spam in the failed test suites
2024-05-14 10:53:30 +02:00
ieQu1 ac3f5a083d
test: Reduce log spam in the failed test suites 2024-05-13 22:00:33 +02:00
ieQu1 8506ca7919
Merge pull request #12998 from ieQu1/dev/improve-latency
Use leader's clock when calculating LTS cutoff timestamp
2024-05-13 21:54:06 +02:00
ieQu1 3da3a36863
test(ds): Add generation in the replication suite 2024-05-13 19:51:04 +02:00
ieQu1 9f7ef9f34f
fix(ds): Apply review remarks 2024-05-13 19:35:24 +02:00
ieQu1 07aa708894
test(ds): Refactor replication suite 2024-05-09 03:56:56 +02:00
ieQu1 63e51fca66
test(ds): Use streams to fill the storage 2024-05-09 02:46:57 +02:00
ieQu1 a0a3977043
feat(ds): Assign latest timestamp deterministically 2024-05-08 23:17:57 +02:00
ieQu1 2236af84ba
feat(ds): two-stage storage commit on the storage level 2024-05-08 23:17:57 +02:00
ieQu1 1ddbbca90e
feat(ds): Allow incremental update of the LTS trie 2024-05-08 23:17:57 +02:00
ieQu1 68ca891f41
test(ds): Use streams to create traffic 2024-05-08 23:17:57 +02:00
Andrew Mayorov d84c180ccb
feat(dsrepl): avoid contacting unreachable ra servers
Assuming estabilished Erlang distribution channel is a reliable way to
tell whether a remote node is reachable.
2024-05-08 18:12:13 +02:00
ieQu1 3642bcd1b6
docs(ds): Fix comment for the builtin DS metrics 2024-05-06 11:21:32 +02:00
ieQu1 b2a633aca1
fix(ds): Use leader's clock for computing LTS safe cutoff time 2024-05-06 11:21:32 +02:00
ieQu1 1ff2e02fd9
feat(ds): Pass current time to the storage layer via argument 2024-05-06 11:21:32 +02:00
ieQu1 8ac9700aab
feat(ds): Add an API for DB-global variables 2024-05-06 11:21:32 +02:00
ieQu1 86d45522e3
fix(dsrepl): Don't reverse elements of batches 2024-05-06 11:21:32 +02:00
ieQu1 bcfa7b2209
fix(ds): Destroy LTS tries when the generation is dropped 2024-05-06 11:21:32 +02:00
ieQu1 9999ccd36c
feat(ds): Ignore safe cutoff time for streams without varying levels 2024-05-06 11:21:32 +02:00
ieQu1 e4c3283c9c docs(ds): Update README with CLI and REST API endpoints 2024-04-23 16:28:35 +02:00