DUO39: Authentication Latency
Incident Report for Duo
Postmortem

Authentication Issues - DUO39

Incident Report - 2019/02/25

Summary:

From 15:10 to 15:25 UTC on February 25th, 2018, the DUO39 deployment experienced performance degradation that resulted in increased authentication latency and intermittent request timeouts. Approximately 50% of authentications processed during this time period were affected. The root cause of this outage has been identified and Duo’s engineering team is committed to reducing the overall impact of similar events going forward.

Details:

The DUO39 application tier is comprised of multiple redundant application servers. At 15:09 UTC, one of the application servers stopped functioning unexpectedly. Duo’s automated monitoring systems detected this failure and initiated a recovery procedure.

Due to the nature of the application server failure, its network connections to the database tier remained open and were not forcibly closed. This caused the database tier to hold open locks on several database tables. Due to these locks, contention for these tables began as other requests were being processed.

At 15:11 UTC Duo’s application monitoring systems alerted the Duo Engineering team to this contention. At 15:15 UTC Duo’s monitoring systems alerted the Duo Engineering team to increased authentication latency.

At 15:25 UTC the previously failed application server returned to service, releasing locks being held at the database tier. At this time, the resource contention was resolved and operations resumed normally.

Posted Feb 26, 2019 - 12:45 EST

Resolved
From 15:10 UTC to 15:25 UTC on February 25, 2019, the DUO39 deployment experienced increased authentication latency. Some authentications during this period failed. The authentication issues have been resolved and the cloud service is working as expected.

We will attach a root-cause analysis (RCA) to this incident once our engineering team has finished its thorough investigation of the issue.
Posted Feb 25, 2019 - 11:25 EST
Monitoring
Our Engineering Team has identified an issue causing authentication latency on the DUO39 deployment. We believe the issue to be resolved and will continue monitoring the health of the deployment.

Please check back here or subscribe here for further updates.
Posted Feb 25, 2019 - 10:44 EST
This incident affected: DUO39 (Core Authentication Service).