Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – is intermittently delayed
Incident Report for Mezmo Status Page
Postmortem

Dates:
Start Time: Saturday, February 26, 2022, at 19:51 UTC
End Time: Sunday, February 27, 2022, at 22:13 UTC
Duration: 26:22:00

What happened:

Ingestion of new logs to our Syslog endpoint – for logs sent using a custom port, only – was intermittently delayed.

Why it happened:

We recently introduced a new service (Syslog Forwarder) to handle the ingestion of logs sent over Syslog. As the name implies, it forwards logs to downstream services. Logs are sent from a range of ports on Syslog Forwarder to a range of ports used by clients running on downstream services. This design worked well in our advance testing, using a limited number of custom ports.

Once running in production, however, the Syslog Forwarder needed to connect to a much larger number of custom ports. We then saw that the ephemeral port ranges of the clients running on downstream services overlapped with the port ranges used by the Syslog Forwarder. This led to occasional port conflicts when services and/or clients tried to start. The services and/or clients would attempt to start again until they found an open port without conflicts. This created delays in ingestion.

How we fixed it:

We changed the ephemeral port ranges of the clients running on downstream services so they no longer overlapped with the port ranges used by the Syslog Forwarder.

What we are doing to prevent it from happening again:

The new ephemeral port range has been incorporated and proven resilient in production. No further work is needed to prevent this kind of incident from happening again.

Posted Mar 01, 2022 - 20:36 UTC

Resolved
This incident has been resolved.
Posted Feb 28, 2022 - 18:58 UTC
Monitoring
A fix has been implemented for the ingestion of new logs to our Syslog endpoint using a custom port. We will continue to monitor the results.
Posted Feb 27, 2022 - 22:15 UTC
Update
Ingestion of new logs to our Syslog endpoint using a Custom Port is still intermittently failing. We are continuing to work on a fix.
Posted Feb 27, 2022 - 19:08 UTC
Identified
Ingestion of new logs to our Syslog endpoint using a Custom Port is intermittently failing.
Posted Feb 26, 2022 - 19:51 UTC
This incident affected: Log Analysis (Log Ingestion (Syslog)).