Fixes https://emqx.atlassian.net/browse/EMQX-10944
Also updates ekka -> 0.15.15, mria -> 0.6.4
How to test
===========
1. Start 2 or more EMQX nodes and merge them in a cluster.
2. Stop them in order.
3. Start only the first node that was stopped in the previous step.
4. Wait until the log is printed.
Or, more easily:
1. Start 2 or more EMQX nodes and merge them in a cluster.
2. Stop all but one.
3. Run `mria_mnesia:diagnosis([]).` on that node.
Example output
==============
```
Check check_open_ports should get ok but got #{msg =>
"some ports are unreachable",
results =>
#{'emqx@172.100.239.4' =>
#{open_ports =>
#{4370 => false,
5370 =>
false},
ports_to_check =>
[4370,5370],
resolved_ips =>
[{172,100,239,
4}],
status =>
bad_ports},
'emqx@172.100.239.5' =>
#{open_ports =>
#{4370 => false,
5370 =>
false},
ports_to_check =>
[4370,5370],
resolved_ips =>
[{172,100,239,
5}],
status =>
bad_ports}}}
```
After one node is back:
```
Check check_open_ports should get ok but got #{msg =>
"some ports are unreachable",
results =>
#{'emqx@172.100.239.4' =>
#{ports_to_check =>
[4370,5370],
resolved_ips =>
[{172,100,239,
4}],
status => ok},
'emqx@172.100.239.5' =>
#{open_ports =>
#{4370 => false,
5370 =>
false},
ports_to_check =>
[4370,5370],
resolved_ips =>
[{172,100,239,
5}],
status =>
bad_ports}}}
```
We need to reverse the dependency of `emqx_bridge` and `emqx_bridge_*`, because the former
loads and starts bridges during its application startup. If the individual bridge
application being loaded has not started with its dependencies, the supervision tree will
not be ready for that.
```
===> Compiling emqx_machine
===> Compiling src/user_default.erl failed
src/user_default.erl:{24,14}: can't find include lib "emqx_dashboard/include/emqx_dashboard.hrl"; Make sure emqx_dashboard is in your app file's 'applications' list
==> emqx_mix
** (Mix) Could not compile dependency :emqx_machine, "/github/home/.mix/elixir/1-14/rebar3 bare compile --paths /__w/emqx/emqx/_build/emqx/lib/*/ebin" command failed. Errors may have been logged above. You can recompile this dependency with "mix deps.compile emqx_machine", update it with "mix deps.update emqx_machine" or clean it with "mix deps.clean emqx_machine"
make: *** [Makefile:276: emqx-elixir] Error 1
```
Fixes https://emqx.atlassian.net/browse/EMQX-10361
- Moves `lib-ee/emqx_ee_schema_registry` to `apps/emqx_schema_registry`.
- Removes the `_ee_` segment from module names.
- Exceptions are the table names which are kept to avoid backwards incompatibilities.
As emqx_ee_schema_registry uses Mria tables (schema_registry_shard),
a node joining a cluster needs to restart this application in order to
restart relevant Mria shard processes.
* master: (194 commits)
fix(limiter): update change && fix deprecated version
chore: update changes
perf(limiter): simplify the memory represent of limiter configuration
ci(perf test): update tf variable name and set job timeout
ci: fix artifact name in scheduled packages workflow
fix: build_packages_cron.yaml workflow
ci: move scheduled builds to a separate workflow
build: check mnesia compatibility when generating mria config
docs: fix a typo in api doc description
feat(./dev): use command style and added 'ctl' command
test: fix delayed-pubish test case flakyness
refactor: remove raw_with_default config load option
chore: add changelog for trace timestrap
feat: increase the time precision of trace logs to microseconds
chore: make sure topic_metrics/rewrite's default is []
docs: Update changes/ce/perf-10417.en.md
chore: bump `snabbkaffe` to 1.0.8
ci: run static checks in separate jobs
chore(schema): mark deprecated quic listener fields ?IMPORTANCE_HIDDEN
chore: remove unused mqtt cap 'subscription_identifiers'
...
* master: (279 commits)
chore: shorten ct/run.sh script
chore: rename cassandra_impl to cassandra_connector
chore: fix mix.exs checking
refactor(cassandra): move cassandra bridge into its own app
chore: apply review suggestions
chore: update changes/ce/fix-10449.en.md
test: add a test for authn {}
chore: add changlog for authn_http validation
fix: always check authn_http's header and ssl_option
chore: apply suggestions from code review
fix(emqx_bridge): validate Webhook bad URL and return 'BAD_REQUEST' if it's invalid
fix(emqx_alarm): add safe call API to activate/deactivate alarms and use it in resource_manager
perf(emqx_alarm): use dirty Mnesia operations to activate an alarm
ci: simplify find-apps.sh for ee apps
perf(emqx_resource): don't reactivate alarms on reoccurring errors
ci: check if Elixir files are formatted in pre-commit hook
fix(dynamo): fix field name errors
chore: remove *_collector for prometheus api's example
chore: make plugins config to low level
chore: re-split dynamo i18n file
...
Prior to this change, when EMQX daemon mode failed to start
it's not quite easy for users to understand what went wrong.
All the know is the node did not start in time
and then instructed to boot the node in 'console' mode wishing
for some logs.
However, the node might actuay be running, causing 'console' mode
to fail with a different reason.
With this change, after a filure of daemon mode boot,
we issue a diagnosis.
1. if node can not be found from ps -ef, instruct the user
to find information in erlang.log.N
2. if the node is found running, but not responding to pings
instruct the user to check if the node name is
resolvable and reachable
3. if the node is responding to pings but emqx app is not
running, then it's likely a bug. so the user is advised
to report a github issue.