Delays loading IT Glue Web App

Incident Report for IT Glue

Postmortem

16:25 PDT Sept 1st - We have an open enquiry with Amazon Web Services to give us full information on the root cause of this incident. There was a fail in one system at 06:17 that caused failover within the cluster within 2 minutes, and consequent latency of all database queries. After successful failover we expected a return to normal performance levels, but saw a continuation of high latency. Our DevSecOps team restarted specific services in order to restore normal performance levels. Once we receive more information from Amazon, we will post a more comprehensive description here.

11:00 PDT Aug 31st - We are still investigating the root cause of this incident, and awaiting further information from external sources.

10:00 PDT Aug 30th - Initial evidence has demonstrated that a deeper investigation will be required by our development team before we can publish a full post mortem on this incident. Next update can be expected by 9am PDT on Aug 31st. We appreciate your patience.

Posted Aug 30, 2017 - 10:22 PDT

Resolved

All IT Glue systems are fully operational. Postmortem to follow by 10am PDT

Posted Aug 30, 2017 - 08:04 PDT

Identified

IT Glue web app is loading normally again. A database issue may be the cause, and a full investigation is underway to determine the exact issue that occurred. An update will be posted by 10am PDT.

Posted Aug 30, 2017 - 07:06 PDT

Investigating

We're investigating delays in loading the IT Glue web app, more details to come as issue is identified.

Posted Aug 30, 2017 - 06:35 PDT