This message is a follow-up to the postmortem published August 10 regarding the delay in Authentication Log reporting that occurred on August 7 from 9:35 AM Eastern (1335 UTC) to 3:45 PM Eastern (1945 UTC).
The following information is for customers using a SIEM or other security tooling via the Duo Admin API. These customers will need to perform manual steps to backfill logs for the affected time period into their SIEM or other system.
If you are not moving Duo logs into an external system or do not wish to backfill your logs into your system, you may disregard this message.
To retrieve your logs, you will need to run a script to download the data from Duo and then import the records into your SIEM or other system.
Step 1: Download the log export script from Duo.
From our Github project page, go to Code > Download ZIP to download the entire project folder.
This folder will contain both the export script, authlog_export.py
, and a requirements.txt
file that will install all the dependencies required for the script to run.
Step 2: Run the script to export your logs.
Execute the script with python authlog_export.py or python3 authlog_export.py
Input your IKEY, SKEY, and host to connect to your Admin API integration.
Specify the directory where you wish to write the logs.
Provide the following start and end time values to fetch the logs for the affected time period. Both values are in UTC and correspond to the incident period of about 9:30 a.m. to 3:45 p.m. ET.
Step 3: Import the downloaded data into your SIEM or other external system.
Follow your usual workflow for manually importing data into your system. Here is a sample set of instructions for Splunk, which uses version 1 of the API.
Click on Next, and then expand the Timestamp settings in the left sidebar to make the following selections:
Click on Next and then from the index dropdown, select Duo as the index to import the data to.
Review your settings and then submit to import the data.
Incident Report - 2020/08/07
Summary:
From 9:35 AM Eastern (1335 UTC) to 3:45 PM Eastern (1945 UTC) on August 7, 2020, new authentication logs were unavailable in the Duo Admin Panel and to customer monitoring workflows, such as automated SIEM logging consumption that relies on retrieving Authentication Logs in near-real-time from Duo’s APIs. From 3:45 PM Eastern (1945 UTC) new logs became available. At 8:49 PM Eastern (0049 UTC August 8, 2020) all data was available. This issue has been resolved; no data was lost due to this incident.
Details:
At 11:35 AM Eastern, Duo’s Engineering Team was notified of customer reports that new authentications had stopped appearing in the Authentication Log in the Admin Panel. The team immediately began investigating the issue.
At 12:39 PM, the team completed its initial investigation and determined that authentications were flowing normally but were not being fully processed through Duo’s logging platform since 9:35 AM Eastern. The team also determined that the log data itself had not been lost. The team then began troubleshooting the log ingestion process.
At 1:30 PM, Duo’s Engineering Team determined that one node in the logging cluster had zero free space remaining, which prevented all new writes to the cluster. The cluster did not properly automatically balance the free space available, even though other nodes in the cluster had significant amounts of free space. Alerts had not fired to inform the team that a single node had no free disk space. The team then began working to add space to the affected node without affecting the rest of the cluster.
At 3:45 PM, the team successfully repaired the cluster and log ingestion resumed normally.
At 4:00 PM, the team began to identify a process to backfill missing logs from earlier in the day.
At 5:45 PM, the team began executing the backfill process and monitoring the logging infrastructure to ensure the process was working.
At 8:49 PM, the team confirmed that all logs had been backfilled and no data had been lost.
Customers who rely on automatic processes to populate a SIEM with Duo authentication logs will need to execute manual steps if they wish to backfill data into their SIEMs. Affected customers will receive those instructions via email before August 12.
In response to this incident, Duo’s Engineering Team is investigating with our partners as to why the cluster’s configuration prevented it from automatically balancing the free space of the cluster. Since the incident the team has improved our monitoring surrounding the log ingestion pipeline, including multiple alerts to monitor free space so that we can more quickly detect similar issues. The team has also improved our backfill automation so that backfill can be started more quickly if needed. Duo’s Engineering team is committed to ensuring that authentication logs are highly available.