Past Incidents

Tuesday 19th September 2023

Managed DNS Small period of reduced availability on Managed DNS

The Managed DNS cluster failed to respond to a small number of DNS queries for a short period of time. We are currently investigating the exact scale and root cause of this incident.

Last night we were able to mitigate an attack with the fine tuning that happaned earlier.

Another brief outage happened last night. Further investigation showed that there was a TCP SYN flood happening from various networks and some UDP floods. We are working with the networks from which we received the traffic and we've put additional protections in place to prevent these incidents.

Metrics shows signs of a brief DDoS on various nodes within the DNS cluster. We are further investigating.

Monday 18th September 2023

Managed GitLab Runner Less capacity available on Managed GitLab Runners

Jobs may be picked up slower within GitLab. This could result in:

jobs being queued for a longer period;
or longer waiting on "Waiting for Runner to become available and request this job.".

This is due to an API issue at our cloud provider. Our system is unable to scale up the GitLab runners as a result. We are waiting for a fix from our cloud provider.

The provider fixed their API and scaling was back operational.

Saturday 2nd September 2023

Managed Monitoring Not all data available on Mimir

An issue has been reported that some metrics are not being displayed since around 5AM-6AM last night. We are investigating.

A fix has been rolled out for all affected customers.

The problem has been found, and we are rolling out a fix to affected customers.

We have found that it is specific to certain customers running Knative, where some specific Knative-related monitoring targets are lost.