Ingestion of new logs — for Syslog only - is intermittently failing

Incident Report for Mezmo Status Page

Postmortem

Dates:
Start Time: Wednesday, February 17, 2022, at 20:56 UTC
End Time: Thursday, February 18, 2022, at 02:15 UTC
Duration: 5:19:00

‌

What happened:

The ingestion of new logs to our Syslog endpoint was intermittently failing.

‌

Why it happened:

We made a code change to the area of our service (Syslog Forwarder) that handles the ingestion of logs sent by Syslog and inadvertently changed how memory is managed. Routine memory garbage collection stopped and memory usage increased on the pods that accept newly submitted log lines over Syslog. Eventually, the increase in memory caused the pods to crash. Any log lines held on those pods were lost and never ingested.

‌

How we fixed it:

We reverted to the previous version of the Syslog Forwarder service. This stopped the pods from crashing.

We then resolved the memory management issue in our code. The new, fixed version was released to production shortly thereafter and performed as expected.

‌

What we are doing to prevent it from happening again:

We have added regression tests to the Syslog Forwarder service to prevent a similar mistake in the future.

Posted Mar 01, 2022 - 20:21 UTC

Resolved

This incident has been resolved. If your team is still unable to send logs via syslog, please let us know at support@logdna.com

Posted Feb 18, 2022 - 02:15 UTC

Investigating

Ingestion of new logs to our Syslog endpoint is intermittently failing. We are investigating.

Posted Feb 18, 2022 - 01:41 UTC

This incident affected: Log Analysis (Log Ingestion (Syslog)).