This commit adds a Clickhouse bridge to EMQX 5. The bridge is similar to
the Clickhouse bridge in the 4.4, but adds the possibility to use
different formats (such as JSON) for values to be inserted.
A lot of the string value fields had default value defined in
schema as list-string rather than binary-string.
This caused the generated schema dump (in JSON format)
to have raw_default field as an integer array.
some telemetry events from wolff are discarded:
* dropped:
this is double counted in wolff,
we now only subscribe to the dropped_queue_full event
* retried_failed:
it has different meanings in wolff,
in wolff, it means it's the 2nd (or onward) produce attempt
in EMQX, it means it's eventually failed after some retries
* retried_success
since we are going to handle the success counters in callbac
this having this reported from wolff will only make things
harder to understand
* failed
wolff never fails (unelss drop which is a different counter)
A new atom was created every time one did a dry run of a Kafka bridge
(that is, clicking the Test button in the settings dialog for the
bridge).
After this fix, we will only create a new atom when a bridge with a new
name is created. This should be acceptable as bridges with new names are
created relatively rarely and it seems to be useful to have an unique
atom for every Wolff producer.
Fixes: https://emqx.atlassian.net/browse/EMQX-8739
In order for the /bridges APIs to be consistent with other APIs, we move
out metrics from GET /bridges/{id} to its own endpoint,
/bridges/{id}/metrics. We also rename /bridges/reset_metrics to
/bridges/metrics/reset.
This makes the buffer/resource workers always use `replayq` for
queuing, along with collecting multiple requests in a single call.
This is done to avoid long message queues for the buffer workers and
rely on `replayq`'s capabilities of offloading to disk and detecting
overflow.
Also, this deprecates the `enable_batch` and `enable_queue` resource
creation options, as: i) queuing is now always enables; ii) batch_size
> 1 <=> batch_enabled. The corresponding metric
`dropped.queue_not_enabled` is dropped, along with `batching`. The
batching is too ephemeral, especially considering a default batch time
of 20 ms, and is not shown in the dashboard, so it was removed.
https://emqx.atlassian.net/browse/EMQX-8548
Currently, we face several issues trying to keep resource metrics
reasonable. For example, when a resource is re-created and has its
metrics reset, but then its durable queue resumes its previous work
and leads to strange (often negative) metrics.
Instead using `counters` that are shared by more than one worker to
manage gauges, we introduce an ETS table whose key is not only scoped
by the Resource ID as before, but also by the worker ID. This way,
when a worker starts/terminates, they should set their own gauges to
their values (often 0 or `replayq:count` when resuming off a queue).
With this scoping and initialization procedure, we'll hopefully avoid
hitting those strange metrics scenarios and have better control over
the gauges.
https://emqx.atlassian.net/browse/EMQX-8547
If a Kafka Producer bridge is given bad configuration (e.g.: bad authn
credentials), the Wolff client process is started successfully, as it
does not attempt to connect, but when the producer process is
attempted to be started, it fails (only then the client tries to
connect to Kafka). At this point, an error was thrown, but the
supervised client process remained running.
If the configuration was later fixed and the bridge updated, which
prompted its removal and recreation, the Wolff client would report to
be "already started", so it would never pick up the new (fixed)
configuration, and the producers would perpetually fail to start until
the node would be restarted.
We simply ensure the client is stopped before throwing the error,
unrolling the start-up procedure.
Given the implicit convention that an egress bridge containing the
`local_topic` config will forward messages without the need for a rule
action, this was added to avoid needing a rule action.
- MQTT topic should be a binary
- use correct gauge functions from `wolff_metrics`.
- don't double increment success counter for kafka action
- adds a few more metrics assertions
This commit adds support for counters and gauges to the Kafka Brige.
The Kafka bridge uses [Wolff](https://github.com/kafka4beam/wolff) for
the Kafka connection. Wolff does its own batching and does not use the
batching functionality in `emqx_resource_worker` that is used by other
bridge types. Therefore, the counter events have to be generated by
Wolff. We have added
[telemetry](https://github.com/beam-telemetry/telemetry) events to Wolff
that we hook into to change counters and gauges for the Kafka bridge. The
counter called `matched` does not depend on specific functionality of
any bridge type so the updates of this counter is moved higher up in the
call chain then previously so that it also gets updated for Kafka
bridges.