chore(ds): Remove an obsolete document; superceded by ./README.md
This commit is contained in:
parent
b50d6bf1fd
commit
139f5dc3bd
|
@ -1,119 +0,0 @@
|
||||||
# General concepts
|
|
||||||
|
|
||||||
In the logic layer we don't speak about replication.
|
|
||||||
This is because we could use an external DB with its own replication logic.
|
|
||||||
|
|
||||||
On the other hand, we introduce notion of shard right here at the logic layer.
|
|
||||||
This is because shared subscription logic needs to be aware of it to some extend, as it has to split work between subscribers somehow.
|
|
||||||
|
|
||||||
|
|
||||||
# Modus operandi
|
|
||||||
|
|
||||||
1. Create a draft implementation of a milestone
|
|
||||||
2. Test performance
|
|
||||||
3. Test consistency
|
|
||||||
4. Goto 1
|
|
||||||
|
|
||||||
# Tables
|
|
||||||
|
|
||||||
## Message storage
|
|
||||||
|
|
||||||
Data is written every time a message matching certain pattern is published.
|
|
||||||
This pattern is not part of the logic layer spec.
|
|
||||||
|
|
||||||
Write throughput: very high
|
|
||||||
|
|
||||||
Data size: very high
|
|
||||||
|
|
||||||
Write pattern: append only
|
|
||||||
|
|
||||||
Read pattern: pseudoserial
|
|
||||||
|
|
||||||
Number of records: O(total write throughput * retention time)
|
|
||||||
|
|
||||||
|
|
||||||
# Push vs. Pull model
|
|
||||||
|
|
||||||
In push model we have replay agents iterating over the dataset in the shards.
|
|
||||||
|
|
||||||
In pull model the client processes work with iterators directly and fetch data from the remote message storage instances via remote procedure calls.
|
|
||||||
|
|
||||||
## Push pros:
|
|
||||||
- Lower latency: message can be dispatched to the client as soon as it's persisted
|
|
||||||
- Less worry about buffering
|
|
||||||
|
|
||||||
## Push cons:
|
|
||||||
- Needs pushback logic
|
|
||||||
- It's not entirely justified when working with external DB that may not provide streaming API
|
|
||||||
|
|
||||||
## Pull pros:
|
|
||||||
- No need for pushback: client advances iterators at its own tempo
|
|
||||||
|
|
||||||
## Pull cons:
|
|
||||||
- 2 messages need to be sent over network for each batch being replayed.
|
|
||||||
- RPC is generally an antipattern
|
|
||||||
|
|
||||||
# Invariants
|
|
||||||
|
|
||||||
- All messages written to the shard always have larger sequence number than all the iterators for the shard (to avoid missing messages during replay)
|
|
||||||
|
|
||||||
|
|
||||||
# Parallel tracks
|
|
||||||
|
|
||||||
## 0. Configuration
|
|
||||||
|
|
||||||
This includes HOCON schema and an interface module that is used by the rest of the code (`emqx_ds_conf.erl`).
|
|
||||||
|
|
||||||
At the early stage we need at least to implement a feature flag that can be used by developers.
|
|
||||||
|
|
||||||
We should have safety measures to prevent a client from breaking down the broker by connecting with clean session = false and subscribing to `#`.
|
|
||||||
|
|
||||||
## 1. Fully implement all emqx_durable_storage APIs
|
|
||||||
|
|
||||||
### Message API
|
|
||||||
|
|
||||||
### Session API
|
|
||||||
|
|
||||||
### Iterator API
|
|
||||||
|
|
||||||
## 2. Implement a filter for messages that has to be persisted
|
|
||||||
|
|
||||||
We don't want to persist ALL messages.
|
|
||||||
Only the messages that can be replayed by some client.
|
|
||||||
|
|
||||||
1. Persistent sessions should signal to the emqx_broker what topic filters should be persisted.
|
|
||||||
Scenario:
|
|
||||||
- Client connects with clean session = false.
|
|
||||||
- Subscribes to the topic filter a/b/#
|
|
||||||
- Now we need to signal to the rest of the broker that messages matching `a/b/#` must be persisted.
|
|
||||||
|
|
||||||
2. Replay feature (separate, optional, under BSL license): in the configuration file we have list of topic filters that specify what topics can be replayed. (Lower prio)
|
|
||||||
- Customers can do this: `#`
|
|
||||||
- Include minimum QoS of messages in the config
|
|
||||||
|
|
||||||
## 3. Replace current emqx_persistent_session with the emqx_durable_storage
|
|
||||||
|
|
||||||
## 4. Garbage collection
|
|
||||||
|
|
||||||
## 5. Tooling for performance and consistency testing
|
|
||||||
|
|
||||||
At the first stage we can just use emqttb:
|
|
||||||
|
|
||||||
https://github.com/emqx/emqttb/blob/master/src/scenarios/emqttb_scenario_persistent_session.erl
|
|
||||||
|
|
||||||
|
|
||||||
Consistency verification at the early stages can just use this test suite:
|
|
||||||
|
|
||||||
`apps/emqx/test/emqx_persistent_session_SUITE.erl`
|
|
||||||
|
|
||||||
## 6. Update rocksdb version in EMQX (low prio)
|
|
||||||
|
|
||||||
## 9999. Alternative schema for rocksdb message table
|
|
||||||
|
|
||||||
https://github.com/emqx/eip/blob/main/active/0023-rocksdb-message-persistence.md#potential-for-future-optimizations-keyspace-based-on-the-learned-topic-patterns
|
|
||||||
|
|
||||||
Problem with the current bitmask-based schema:
|
|
||||||
|
|
||||||
- Good: `a/b/c/#`
|
|
||||||
|
|
||||||
- Bad: `+/a/b/c/d`
|
|
Loading…
Reference in New Issue