Commit Graph

244 Commits

Author SHA1 Message Date
Andrew Mayorov fa66a640c3
fix(dsrepl): handle RPC errors gracefully when storage is down 2024-03-28 15:17:01 +01:00
Ivan Dyachkov db9efb9317 chore: bump apps versions 2024-03-28 10:19:09 +01:00
Thales Macedo Garitezi 796c04e7a8 test: fix flaky test
We should emit the trace event before replying to callers.

Example failure:

https://github.com/emqx/emqx/actions/runs/8378977952/job/22946318696#step:6:182

```
 =CRITICAL REPORT==== 21-Mar-2024::17:45:37.676024 ===
"check stage" failed: error
{assertMatch,[{module,emqx_ds_storage_bitfield_lts_SUITE},
              {line,270},
              {expression,"? of_kind ( emqx_ds_replication_layer_egress_flush , Trace )"},
              {pattern,"[ # { batch := [ _ , _ , _ ] } ]"},
              {value,[]}]}
Stacktrace: [{emqx_ds_storage_bitfield_lts_SUITE,
                 '-t_atomic_store_batch/1-fun-1-',1,
                 [{file,
                      "/__w/emqx/emqx/apps/emqx_durable_storage/test/emqx_ds_storage_bitfield_lts_SUITE.erl"},
                  {line,270}]},
             {emqx_ds_storage_bitfield_lts_SUITE,t_atomic_store_batch,1,
                 [{file,
                      "/__w/emqx/emqx/apps/emqx_durable_storage/test/emqx_ds_storage_bitfield_lts_SUITE.erl"},
                  {line,249}]}]
```
2024-03-21 15:47:29 -03:00
Thales Macedo Garitezi 68af211130 fix(ds): reply sync callers after raft store failure 2024-03-21 15:40:21 -03:00
Thales Macedo Garitezi 70737a437a fix(ds): add caller to pending replies before flushing 2024-03-21 14:39:21 -03:00
Andrew Mayorov fe50a1711b
fix(ds-egress): drop pending batch on failures
Before this commit, messages in the current batch will be retried as
part of next batch. This could have led to message duplication which is
probably not what the user wants by default.
2024-03-20 13:20:25 +01:00
Andrew Mayorov a1f5de3f5b
fix(dsrepl): turn memoize into a safer function 2024-03-20 13:20:24 +01:00
Andrew Mayorov d39ca41070
chore(dsrepl): mark per-node `add_generation` RPC target obsolete
Also annotate internal exports with comments according with their
intended use.
2024-03-20 13:20:24 +01:00
Andrew Mayorov 35b18f9125
fix(dsrepl): properly handle error conditions in generation mgmt
Also update few outdated typespecs. Also make error reasons easier
to comprehend.
2024-03-20 13:20:24 +01:00
Andrew Mayorov f2268aa69a
fix(dsrepl): use correct base dir for ra system stuff
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-03-20 13:20:24 +01:00
Andrew Mayorov 404e919494
refactor(dsrepl): make shard allocator more robust and consistent
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-03-20 13:20:24 +01:00
Andrew Mayorov 0e18bd6e80
fix(dsrepl): increase replication site id bitsize back
In order to minimize chances of site id collision to practically zero.
2024-03-20 13:20:24 +01:00
Andrew Mayorov ac9700dd28
fix(dsrepl): split shard allocator into a separate module 2024-03-20 13:20:23 +01:00
Andrew Mayorov 1b647035d0
chore(dsrepl): make dialyzer a bit happier 2024-03-20 13:20:23 +01:00
Andrew Mayorov 611b3f0e07
feat(dsrepl): use more straightforward way to drop ra shards 2024-03-20 13:20:23 +01:00
Andrew Mayorov 74881e8706
feat(dsrepl): make storage layer unaware of granularity of time
Storage also becomes a bit more pure, depending on the upper layer to
provide the timestamps, which also makes it possible to handle more
operations idempotently.
2024-03-20 13:20:23 +01:00
Andrew Mayorov 3cb36a5619
feat(ds-lts): extract timestamp from storage key itself
1. This avoids the need to deserialize the message to get the timestamp.
2. It also makes possible to decouple the storage key timestamp from the
   message timestamp, which might be useful for replication purposes.
2024-03-19 20:21:56 +01:00
Andrew Mayorov 5cc0246351
feat(dsrepl): allow to tune select ra options 2024-03-19 20:21:55 +01:00
Andrew Mayorov 54b5adf868
feat(dsrepl): allocate shards predictably
To ensure strictly optimal and fair shard allocation across
cluster. Before this commit it was quite easy to end up with
an allocation significantly skewed towards some node, because
of the nature of randomness and relatively small number of
shards.
2024-03-19 20:21:55 +01:00
Andrew Mayorov 887e151be5
fix(dsrepl): handle errors gracefully in shard egress process
Also add cooldown on timeout / unavailability.
2024-03-19 20:21:53 +01:00
Andrew Mayorov e16aee99b4
chore(dsrepl): clarify how to perform leadership transfer in runtime 2024-03-19 20:21:18 +01:00
Andrew Mayorov 00d509f27b
feat(dsrepl): prefer local replica in read path
To optimize out any unnecessary RPCs. Given the load should be
smoothed evenly across the cluster, choosing non-leader node is
not a priority.
2024-03-19 20:11:42 +01:00
Andrew Mayorov 19305c223c
fix(dsrepl): require majority for replication-related tables 2024-03-19 20:11:42 +01:00
Andrew Mayorov f89909f60c
fix(dsrepl): tolerate trigger election timeouts for existing servers 2024-03-19 20:11:42 +01:00
Andrew Mayorov 3b59cf2ebf
feat(dsrepl): move shard allocation to separate process
That starts shard and egress processes only when shards are fully
allocated.
2024-03-19 20:11:41 +01:00
Andrew Mayorov 4dafbf21f6
fix(dsrepl): make db + shard part of machine state
It doesn't feel right, but right now is the easiest way to have it
in the scope of `apply/3`, because `init/1` doesn't actually invoked
for ra machines recovered from the existing log / snapshot.
2024-03-19 20:11:41 +01:00
Andrew Mayorov d19128ed65
feat(dsrepl): cache shard metadata in persistent terms 2024-03-19 20:11:41 +01:00
Andrew Mayorov e6c2c2fb07
feat(dsrepl): manage generations / db config through ra machine 2024-03-19 20:11:39 +01:00
Andrew Mayorov 5e94bdb932
feat(dsrepl): allocate shards once predefined number of sites online
Before this commit the most likely shard allocation outcome was
that all shard are allocated to just one node.
2024-03-19 20:11:03 +01:00
Andrew Mayorov be793e4735
fix(dsrepl): reassign timestamp at the time of submission
This is needed to ensure total message order for a shard, and
guarantee that no messages will be written "in the past". which
may break replay consistency.
2024-03-19 20:11:01 +01:00
Andrew Mayorov 146f082fdc
feat(dsrepl): implement raft-based replication
Still very rough but mostly working.
2024-03-19 20:09:44 +01:00
Ivan Dyachkov f2dc940436 Merge remote-tracking branch 'upstream/release-56' into 0319-sync-release56 2024-03-19 15:20:08 +01:00
Thales Macedo Garitezi 11ae04d810 test(ds): rm unused var warning 2024-03-14 17:16:24 -03:00
Thales Macedo Garitezi 2ebc8dcc55 fix(ds): use `infinity` timeout when storing batches 2024-03-14 10:17:18 -03:00
Thales Macedo Garitezi 6af01b916e feat(ds): implement `get_delete_streams`, `make_delete_iterator` and `delete_next` callbacks for builtin storage
Part of https://emqx.atlassian.net/browse/EMQX-11841
2024-03-08 09:56:46 -03:00
Andrew Mayorov f7e3afde16
test(ds): avoid introducing new macros 2024-03-07 16:49:20 +01:00
Andrew Mayorov 69427dc42d
test(ds): add tests for error mapping and replay recovery 2024-03-07 12:59:58 +01:00
Andrew Mayorov 09905d78cd
chore(ds): make error handling slightly simpler
Co-Authored-By: Thales Macedo Garitezi <thalesmg@gmail.com>
2024-03-07 12:59:57 +01:00
Andrew Mayorov b39c710ec2
fix(ds): tidy up few typespecs 2024-03-07 12:59:57 +01:00
Andrew Mayorov 2146d9e1fe
feat(ds): introduce error classes in critical API functions
For now, only recoverable / unrecoverable errors are introduced.
2024-03-07 12:59:57 +01:00
Thales Macedo Garitezi 5d87d400f4 feat(ds): add atomic store API
Part of https://emqx.atlassian.net/browse/EMQX-11841
2024-03-06 15:24:14 -03:00
Thales Macedo Garitezi 06334798a5 fix(ds): fix `drop_generation` typespec
This typespec fix will be used downstream by other backends.
2024-03-04 14:15:59 -03:00
Ilya Averyanov b706caf294 feat(ds): export types 2024-02-29 14:27:18 +03:00
Ilya Averyanov d5ae0e5c53 feat(ds): update delete/count interface 2024-02-28 22:51:24 +03:00
Ilya Averyanov b010d34640 chore(ds): add delete callbacks 2024-02-26 17:35:13 +03:00
Zaiming (Stone) Shi 46877e979b chore: update copyright-year 2024-02-23 08:21:06 +01:00
Thales Macedo Garitezi d469f4158e chore: bump app vsns 2024-02-20 16:53:57 -03:00
ieQu1 8cfb22f0b8
fix(ds): Retry getting the shard leader 2024-02-16 12:42:48 +01:00
ieQu1 280fcd8c52
Merge pull request #12437 from ieQu1/dev/optimize_make_filter
Optimize emqx_ds_bitmask_keymapper:make_filter function.
2024-02-05 17:32:28 +01:00
ieQu1 4665837cf0 fix(ds): Apply review remarks 2024-02-05 16:52:06 +01:00
ieQu1 c7888ad1f1
Merge pull request #12475 from ieQu1/dev/lean-stream
Use a more compact data structure to represent streams
2024-02-05 13:55:24 +01:00
ieQu1 698ba3f271 fix(ds): Optimize emqx_ds_bitmask_keymapper:make_filter
This optimization makes idle polling faster
2024-02-05 10:54:19 +01:00
ieQu1 8edbec5929 refactor(ds): Clarify the language used in ds_bitmapper 2024-02-05 10:54:18 +01:00
Zaiming (Stone) Shi 75023f2ca3
Merge pull request #12442 from ieQu1/dev/ds-license-apache
Relicense apps/emqx_durable_storage under Apache 2.0
2024-02-05 10:16:52 +01:00
ieQu1 2e56810ea2 refactor(ds): Use a simple improper list to represent the streams 2024-02-03 21:15:54 +01:00
ieQu1 3d2ac97c61 feat(ds): Add a CLI interface to inspect status of DS databases 2024-02-02 10:11:01 +01:00
ieQu1 139f5dc3bd chore(ds): Remove an obsolete document; superceded by ./README.md 2024-02-01 00:14:08 +01:00
ieQu1 b50d6bf1fd chore(ds): Change the license to Apache 2.0
Due to technicalities parts of the original code were licensed under
BSL.
In preparations for the public release of the feature, the license has
been changed to Apache 2.0
2024-02-01 00:10:48 +01:00
Thales Macedo Garitezi f7b12470dd test(ds): fix inter-suite flakiness
Attempt to mitigate this frequent source of flakiness:

```
=CRASH REPORT==== 31-Jan-2024::17:30:15.025404 ===
  crasher:
    initial call: emqx_ds_replication_layer_egress:init/1
    pid: <0.11312.0>
    registered_name: []
    exception error: no match of right hand side value {error,
                                                        no_leader_for_shard}
      in function  emqx_ds_replication_layer_egress:init/1 (/emqx/apps/emqx_durable_storage/src/emqx_ds_replication_layer_egress.erl, line 93)
      in call from gen_server:init_it/2 (gen_server.erl, line 980)
      in call from gen_server:init_it/6 (gen_server.erl, line 935)
    ancestors: [<0.11310.0>,<0.11304.0>,emqx_ds_builtin_databases_sup,
                  emqx_ds_builtin_sup,emqx_ds_sup,<0.11236.0>]
    message_queue_len: 0
    messages: []
    links: [<0.11310.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 28
    reductions: 231
  neighbours:
```
2024-01-31 15:42:26 -03:00
Thales Macedo Garitezi d51deac222 fix(ds): use configured data dir for site storage 2024-01-31 15:42:26 -03:00
JianBo He 35c4ef2ee2
Merge pull request #12407 from zmstone/0126-script-to-update-bsl-license-convert-time
0126 script to update bsl license convert time
2024-01-30 09:54:34 +08:00
ieQu1 7ee0a6aaa4
Merge pull request #12414 from ieQu1/dev/ds-readme
doc(ds): Update README
2024-01-29 18:01:03 +01:00
Zaiming (Stone) Shi 82403167c2 chore: update BSL license change date 2024-01-29 16:47:31 +01:00
ieQu1 2479e1189a fix(ds): Remove unused module 2024-01-29 00:36:13 +01:00
ieQu1 96c527541b docs(ds): Update README 2024-01-28 20:27:20 +01:00
ieQu1 eec56b0d6b fix(sessds): Improve comments 2024-01-26 17:49:33 +01:00
ieQu1 8e8d3af096 fix(sessds): Refactor emqx_persistent_session_ds to use CRUD module 2024-01-26 17:49:33 +01:00
Thales Macedo Garitezi 8e31afe6c2 fix(ds): don't make data dir part of the schema
The data directory was ending up being persisted in the database schema.  This led to
issues when opening the DB on different nodes.
2024-01-25 14:44:06 -03:00
ieQu1 305a54f646 chore(ds): Update BPAPI version 2024-01-24 19:33:30 +01:00
ieQu1 eee221f1d0 feat(ds): Make egress batching configurable 2024-01-24 19:33:30 +01:00
ieQu1 137535a821 feat(ds): Introduce egress process for the builtin backend 2024-01-24 19:33:30 +01:00
ieQu1 9b7df302e8 fix(ds): Cache database metadata in RAM 2024-01-24 18:43:48 +01:00
Thales Macedo Garitezi 1eb47d0c16 perf(ds): inherit only LTS paths containing wildcards when adding a new generation
Fixes https://github.com/emqx/emqx/pull/12338#discussion_r1462139499
2024-01-23 09:20:28 -03:00
Thales Macedo Garitezi d122340c13
Merge pull request #12338 from thalesmg/ds-message-gc-20240115
feat(ds): add message GC
2024-01-22 16:57:26 -03:00
Thales Macedo Garitezi db710c4be5 feat(lts): inherit previous generation's lts when possible 2024-01-22 14:53:17 -03:00
Thales Macedo Garitezi 75b08b525b feat(ds): add `list_generations` and `drop_generation` APIs 2024-01-22 14:53:17 -03:00
Thales Macedo Garitezi 57074015c6 feat(ds): allow customizing the data directory
The storage expectations for the RocksDB DB may be different from our usual data
directory.  Also, it may consume a lot more storage than other data.

This allows customizing the data directory for the builtin DS storage backend.

Note: if the cluster was already initialized using a directory path, changing that config
will have no effect.  This path is currently persisted in mnesia and used when reopening
the DB.
2024-01-19 13:07:24 -03:00
Thales Macedo Garitezi 7763b0fd34 chore: bump app vsns 2024-01-08 17:42:56 -03:00
Thales Macedo Garitezi 756980837c Merge branch 'release-54' into sync-r54-m-20240108 2024-01-08 17:39:42 -03:00
ieQu1 caf461fdf6 fix(ds): Don't start the supervision tree when feature is not in use 2024-01-08 17:59:19 +01:00
Ilya Averyanov 9e0f3ce53b feat(ds): restore original add_generation/update_db_config callback semantics 2023-12-29 13:12:15 +03:00
JimMoen 5e100f52b8
style: erlfmt all `rebar.config` files and `bin/nodetool` 2023-12-29 09:08:03 +08:00
Zaiming (Stone) Shi 23ded313ec chore: update app versions 2023-12-22 15:29:22 +01:00
Zaiming (Stone) Shi 20543d55ef chore: bump app vsn 2023-12-22 13:13:30 +01:00
Zaiming (Stone) Shi 322b7bb7d2 chore: bump app vsn 2023-12-22 13:00:37 +01:00
firest 612be8e280 fix(ds): make dialyzer happy 2023-12-20 22:09:03 +08:00
firest ed38ca67d5 fix(ds): Unified the names of the `add_generation` API 2023-12-20 18:58:03 +08:00
firest 31060733a5 feat(ds): add an API for making new generations 2023-12-15 16:08:52 +08:00
Zaiming (Stone) Shi c22a3686ae fix(emqx_durable_storage): fix type specs 2023-12-14 17:35:32 +01:00
Zaiming (Stone) Shi c1f2287b86 Merge remote-tracking branch 'origin/release-54' 2023-12-14 15:26:49 +01:00
Thales Macedo Garitezi dbc8141930
Merge pull request #12125 from thalesmg/ds-cache-m-20231206
chore(ds): return DS key from `next` and add `update_iterator` callback
2023-12-14 10:23:25 -03:00
Zaiming (Stone) Shi d560366c14 test: fix some compile warnings 2023-12-11 09:43:13 +01:00
Zaiming (Stone) Shi 50f4aba5cd fix(dialyzer): batch 3 2023-12-09 15:50:09 +01:00
Thales Macedo Garitezi 66d043becd feat(ds): introduce `update_iterator` callback 2023-12-08 15:04:18 -03:00
Zaiming (Stone) Shi ddbb8560fa fix(dialyzer): batch 2 2023-12-08 17:59:55 +01:00
Thales Macedo Garitezi 2a6d72878f chore(ds): return DS message key along with batch 2023-12-07 11:36:08 -03:00
ieQu1 c43b3eb535 fix(sessds): Add debug logs for the session garbage collection 2023-12-06 15:37:23 +01:00
Andrew Mayorov 130a5a5442
fix(ds): pass topics to `emqx_topic:words/1` before feeding LTS tree
So that empty levels in topics will be properly mapped into `''` atoms.
2023-12-04 13:39:01 +03:00
ieQu1 e238602533 fix(ds): Update README 2023-12-01 08:27:05 +01:00
ieQu1 0ae618d010 fix(ds): Use emqx_rpc for calls that work with large binaries 2023-12-01 08:27:05 +01:00