Acquia DAM Performance Issue
Incident Report for Acquia DAM
Postmortem

Overview

On April 16th, 18th, and 19th, 2024, our US customers experienced file processing delays. During this time our engineers were focused on fixing the underlying issue as well as getting new work moving as soon as possible. Whenever new work was able to process normally they focused on the backlog of impacted jobs. During this process multiple issues were discovered and addressed. 

Root Cause

On April 15th, 2024, a routine security upgrade for our file processing service was deployed. This upgrade unexpectedly caused jobs to be stored longer than normal and caused some failed work to continuously retry. These issues reduced the processing capacity of the service which limited the amount of new work we could process at a time, causing work to backup. Multiple temporary fixes were deployed to restore the service's capacity. These temporary fixes allowed new work to process and the backlog to be addressed, but did not resolve the underlying issue which caused the issue to reoccur later in the week.

Once the issues were discovered the temporary fixes were removed and a more permanent fix was implemented. 

What We're Doing?

We've implemented a fix and we are working on improving our alerts to catch these types of issues faster.

Posted Apr 29, 2024 - 11:05 CDT

Resolved
We've resolved the issue causing degraded file processing performance for some customers and we've completed working through the backlog of impacted work. Contact Customer Support for help on Acquia DAM Community if you continue to experience any ongoing issues.
Posted Apr 19, 2024 - 17:35 CDT
Monitoring
We've resolved the issue causing degraded file processing performance for some customers. New work should be processing normally and we are working through the backlog of impacted work.
Posted Apr 18, 2024 - 16:14 CDT
Identified
We've identified the issue causing degraded file processing performance for some customers and we are working on a fix.
Posted Apr 18, 2024 - 10:57 CDT
Update
We are continuing to investigate this issue.
Posted Apr 18, 2024 - 10:00 CDT
Investigating
We've received reports that some customers are experiencing degraded file processing performance and we are actively investigating.
Posted Apr 18, 2024 - 08:58 CDT
Update
We are continuing to work through the backlog of impacted work and are monitoring to ensure stability.
Posted Apr 16, 2024 - 15:33 CDT
Monitoring
We've resolved the issue causing degraded file processing performance for some customers. New work should be processing normally and we are working through the backlog of impacted work.
Posted Apr 16, 2024 - 13:21 CDT
Update
We have implemented a fix for this issue and we are seeing most new work process normally. We are continuing to work on a fix for the remaining issue. Any processing work impacted by this issue will reprocessed once the remaining issue is resolved.
Posted Apr 16, 2024 - 12:16 CDT
Update
We are continuing to work on a fix for this issue.
Posted Apr 16, 2024 - 10:55 CDT
Identified
We've identified the issue causing degraded file processing performance for some customers and we are working on a fix.
Posted Apr 16, 2024 - 09:58 CDT
Investigating
We're currently investigating an issue causing degraded file processing performance for some customers.
Posted Apr 16, 2024 - 09:15 CDT
This incident affected: Assets & Templates.