Dates:
Start Time: Thursday, January 14, 2021, at 19:42 UTC
End Time: Thursday, January 14, 2021, at 20:27 UTC
Duration: 0:45:00
What happened:
Our WebUI became unavailable and ingestion of new logs stopped for 45 minutes. Logs were automatically resent later and ingested successfully for customers using our ingestion client agent.
Why it happened:
The certificate used by all our services expired. Consequently, all API calls to our service failed, which caused our WebUI to fail and ingestion of new logs to stop.
How we fixed it:
We renewed the certificate and applied it to all affected services. Our WebUI became responsive again and ingestion resumed. Since no logs had been ingested for about 45 minutes, our service had a moderately large backlog to process. As it caught up, users experienced delays in searching, graphing, and timelines for newly submitted logs.
What we are doing to prevent it from happening again:
We’re tightening our internal notifications of upcoming expiration dates for all certificates our service relies upon.