User sessions are timing out and customers are required to login again

Incident Report for Mezmo Status Page

Postmortem

Dates:
Start Time: Monday, June 19, 2023, at 10:31 UTC
End Time: Monday, June 19, 2023, at 12:35 UTC
Duration: 124 minutes

What happened:

Users were being logged out of our WebUI frequently – within 1-2 minutes of logging in. Users could successfully login again, but the new session would also expire quickly.

Why it happened:

The cache of logged in users held in our Redis database was being cleared every 1-2 minutes. This caused all user sessions to expire and new logins to be required. We have yet to ascertain why the cache was being periodically cleared at frequent intervals.

How we fixed it:

We restarted the pods running the Redis database and the cache behavior returned to normal.

What we are doing to prevent it from happening again:

We will investigate further to learn why the Redis cache was being frequently cleared.

Posted Jun 28, 2023 - 18:32 UTC

Resolved

This incident has been resolved.
Posted Jun 19, 2023 - 13:58 UTC

Monitoring

The fix was implemented and we are now monitoring the user login sessions.
Posted Jun 19, 2023 - 11:27 UTC

Identified

The issue has been identified, and a fix is being implemented.
Posted Jun 19, 2023 - 11:13 UTC

Investigating

User sessions to our Web UI are timing out and customers using the UI have to log in every 1-2 minutes. We are investigating why this is happening, but the rest of the service is fully functional. No other components are affected.
Posted Jun 19, 2023 - 11:09 UTC
This incident affected: Log Analysis (Web App).