DUO55: Authentication Latency
Incident Report for Duo
Postmortem

Summary:

From 3:05 to 3:22 UTC on March 2nd, 2019, the DUO55 deployment experienced performance degradation that resulted in increased authentication latency. The majority of authentications during this time were affected. The root cause of this outage has been identified and Duo’s engineering team is committed to reducing the overall impact of similar events going forward.

Details:

The DUO55 database tier is composed of multiple database clusters. Each cluster is composed of multiple sets of database hardware in several separate locations, configured in an active/standby setup.

At 3:05 UTC, one of DUO55’s active databases failed, preventing interactions with data contained within and causing authentication failures. Automated failover processes began configuring standby hardware to replace the failed database.

At 3:22 UTC, the failover was complete and service was completely restored. We will continue to explore additional infrastructure and software improvements to reduce failover time and customer impact moving forward.

Posted Mar 02, 2019 - 22:04 EST

Resolved
From 3:05 UTC to 3:22 UTC on March 2, 2019, the DUO55 deployment experienced increased authentication latency. Some authentications during this period failed. The authentication issues have been resolved and the cloud service is working as expected.

We will attach a root-cause analysis (RCA) to this incident once our engineering team has finished its thorough investigation of the issue.
Posted Mar 01, 2019 - 23:47 EST
Monitoring
Our Engineering Team has identified an issue causing authentication latency on the DUO55 deployment. We believe the issue to be resolved and will continue monitoring the health of the deployment.
Posted Mar 01, 2019 - 22:44 EST
This incident affected: DUO55 (Core Authentication Service, Admin Panel, Push Delivery, Phone Call Delivery, SMS Message Delivery, Cloud PKI).