Purple Mash Instability
Incident Report for 2Simple Software
Postmortem

Today we had a Purple Mash incident occurring from 11:23 to 12:27. The site was up and down during this time period, total downtime was approximately half an hour.

The problem was caused by our replica MongoDB database running out of CPU credits which caused slowdowns which in turn caused the main database to run out of connections. As this was a completely new problem it took a bit more time then we would have liked to diagnose and fix it.

We have now added more capacity on this replica server and put additional monitoring in place to prevent this from happening again. We have also identified a faster way to recover the platform when this type of problem occurs.

Posted Apr 02, 2020 - 12:49 BST

Resolved
This incident has been resolved.
Posted Apr 02, 2020 - 12:46 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 02, 2020 - 12:28 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 02, 2020 - 12:20 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 02, 2020 - 12:15 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 02, 2020 - 12:12 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 02, 2020 - 11:52 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 02, 2020 - 11:41 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 02, 2020 - 11:37 BST
Update
We are continuing to monitor for any further issues.
Posted Apr 02, 2020 - 11:30 BST
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Apr 02, 2020 - 11:27 BST
Identified
The issue has been identified and a fix is being implemented.
Posted Apr 02, 2020 - 11:23 BST
This incident affected: Purple Mash Site.