Fixes https://emqx.atlassian.net/browse/EMQX-10629
During health checking, we check whether tables in the SQL statement
exist. Such check was done by asking the backend to parse the
statement using a named prepared statements. Concurrent health checks
could then result in the error:
```erlang
{error,{error,error,<<"42P05">>,duplicate_prepared_statement,<<"prepared statement \"get_status\" already exists">>,[{file,<<"prepare.c">>},{line,<<"451">>},{routine,<<"StorePreparedStatement">>},{severity,<<"ERROR">>}]}}
```
This could lead to an inconsistent state in the driver process, which
would crash later when a message from the backend (`READY_FOR_QUERY`, "idle"):
```
2023-07-24T13:05:58.892043+00:00 [error] Generic server <0.2134.0> terminating. Reason: {'module could not be loaded',[{undefined,handle_message,[90,<<"I">>,...
```
Added calls to `epgsql:sync/1` for functions that could return
`{error, sync_required}`.
Also, redundant calls to `parse2` were removed to reduce the number of requests.
Fixes https://emqx.atlassian.net/browse/EMQX-10405
The problem here was that, for async requests, ehttpc responses of the form `{ok, 4__, _,
_}` and similar were being treated as successes.
Fixes https://emqx.atlassian.net/browse/EMQX-9603
Rather than relying on a JWT worker to produce and refresh tokens, we
could just produce then on demand when pushing the messages to GCP
PubSub. That can generate a bit of extra work (as multiple processes
might realize it’s time to refresh the JWT and do so), but that
shouldn’t be much. In return, we avoid any possibility of not having
a fresh JWT when pushing messages.
That currently tunes the number of MQTT clients employed both for
subscriptions (if shared subscription is used) and for publishing to
a remote broker.
It adds no value: the only mode was `cluster_shareload` and we just as
well can decide to "share" the load across cluster just by looking if
the remote topic is shared subcription filter or not.
Inline `emqx_connector_mqtt_msg` module code into
`emqx_connector_mqtt_worker` module, since it's not really used
anywhere else and does not provide any reusable abstractions.
Fixes https://emqx.atlassian.net/browse/EMQX-10001
Recently, we unified request_timeout in a single field located at the
webhook connector schema. However, the correct fix would be to use
the resource_opts.request_timeout one, as that’s the only one that
allows infinity timeout.
This commit makes sure that the RabbitMQ password filed is marked as
required. This ensures that the user provides a password and that the
bridge does not throw a function clause exception if the password filed
is not set.
Fixes:
https://emqx.atlassian.net/browse/EMQX-9974
connect failure should be at warning level but not error,
the connecting state is visiable from dashbaord
also the resource manager logs connection failures in general
at warning level
This surprisingly simple change yields a big performance improvement
in throughput.
While the previous commit achieves ~ 55 k messages / s
in throughput under some test conditions (100 k concurrent publishers
publishing 1 QoS 1 message per second), the simple change in this
commit improves it further to ~ 63 k messages / s.
Benchmarks indicated that the evaluating one reply function is
consistently quite fast (~ 20 µs), which makes this performance gain
counterintuitive. Perhaps, although each call is cheap, `ehttpc`
calls several of these in a row when there are several sent requests,
and those costs might add up in latency.
This is a performance improvement for webhook bridge.
Since this bridge is called using `async` callback mode, and `ehttpc`
frequently returns errors of the form `normal` and `{shutdown,
normal}` that are retried "for free" by `ehttpc`, we add this behavior
to async requests as well. Other errors are retried too, but they are
not "free": 3 attempts are made at a maximum.
This is important because, when using buffer workers, we should avoid
making them enter the `blocked` state, since that halts all progress
and makes throughput plummet.
Before this change, a separate `manager_id` / `instance_id` was used
as resource manager id, which made connector interface somewhat
inconsistent: part of function calls to connector implementation
used instance id as first argument while the rest used resource id
itself.
Apparently, the async reply returned by ehttpc can be `{shutdown,
normal}` or `{closed, "The connection was lost."}`, in which case the
request should be retried.
```
Apr 28 17:40:41 emqx-0.int.thales bash[48880]: 17:40:41.803 [error] [id: "bridge:webhook:webhook", msg: :unrecoverable_error, reason: {:shutdown, :normal}]
Apr 28 18:36:37 emqx-0.int.thales bash[53368]: 18:36:37.605 [error] [id: "bridge:webhook:webhook", msg: :unrecoverable_error, reason: {:closed, 'The connection was lost.'}]
```
Fixes https://emqx.atlassian.net/browse/EMQX-9130
Since buffer workers always support async calls ("outer calls"), we
should decouple those two call modes (inner and outer), and avoid
exposing the inner call configuration to user to avoid complexity.
For bridges that currently only allow sync query modes, we should
allow them to be configured with async. That means basically all
bridge types except Kafka Producer.
This complements PR https://github.com/emqx/emqx/pull/10124.
The default values for duration_ms() fields needs to be formatted
as a binary string with unit to show up correctly in the dashboard
UI.
Our MongoDB driver creates a new temporary connection, for every active
connection, to just do a single heartbeat test. There is configurable
delay between every heartbeat test. When the user has an EMQX cluster
with a MongoDB bridge (to a MongoDB replica set), there will be a lot of
connections. Furthermore, as MongoDB creates a log entry every time a
new connection is created, the log will be flooded with info about new
connection. One user have reported more than 1MB of log data in a 10
minute period.
This commit tries to fix this by increasing the default delay between
heartbeats. A better fix would be to change the MongoDB driver so that
it does not create a new connection just to do a heartbeat check, but
this is more complicated so we leave this to the future. We might also
swap out the current MongoDB driver to something better.
Fixes:
https://github.com/emqx/emqx/issues/9851
Also hexencode non-utf8 binaries. This is essentially an heuristic.
We don't know column types in runtime, and there's no simple way
to find them out. Since we're already doing full binary scan during
escaping it should be cheap to bail out on non-utf8 strings and
hexencode them instead.
Also introduce separate function to highlight that this escaping
is MySQL-specific.
So that mysql client won't attempt to prepare them automatically, thus
trashing the server's prepared statements table and making interaction
overall heavier.
When using async mode with the webhook bridge, queued messages that are
not fully processed when the connection times out could be lost. This
commit fixes this by letting the bridge return a recoverable_error when
this happen. The message send will then be retried in sync mode by the
emqx_resource_buffer_worker.
Fixes: https://emqx.atlassian.net/browse/EMQX-8974
This commit adds a Clickhouse bridge to EMQX 5. The bridge is similar to
the Clickhouse bridge in the 4.4, but adds the possibility to use
different formats (such as JSON) for values to be inserted.
A lot of the string value fields had default value defined in
schema as list-string rather than binary-string.
This caused the generated schema dump (in JSON format)
to have raw_default field as an integer array.
Fixes https://github.com/emqx/emqx/issues/9907
At v5.0.14, we changed the `ssl` option for the Postgres connector
from `true` to `required`, but there was another transformation in
`conn_opts/2` that led to an incorrect configuration. This change
ended up preventing users from connecting to Postgres with their
previous configurations after upgrading EMQX.
This should help to avoid delivery failures of messages which could
be safely retried, in the event of intermittent connectivity loss
for example.
It should now be safe since 73d5592b.
There might be another possibility for leakage. If the resource mangager
for the webhook resource crashes, OTP might log the spec for the
resource manager which contains the Config and thus the Authorization
header. This is probably an issue for other resources as well and should
be fixed in another commit. The following issue has been created for
that:
https://emqx.atlassian.net/browse/EMQX-8794
Fixes:
https://emqx.atlassian.net/browse/EMQX-8791