Reports of Call Quality Issues
Incident Report for I.T Communications Limited
Postmortem

Overview

A Provider backhaul link flap resulted in an undesirable route of traffic.

**Problem Description
**A brief interruption on a provider's core backhaul link between GB/LON07 (Interxion LON1) and GB/LON02 (Telehouse North) caused an undesirable reroute of traffic.

**Impact
**A subset of customers located in GB/LON02, GB/LON06 (Volta), and GB/LON07 of presence experienced a period of high latency and packet loss for a period of up to 1 hour.

Timeline-Please note that all times are UTC

Tuesday 23Feb 2021 15:34 - Link flap event occurs, traffic reroutes away from affected backhaul.

Tuesday 23Feb 2021 15:48 - First reports of packet loss, network team investigation begins.

Tuesday 23Feb 2021 16:39 - One of the links in the undesirable path is shut down to force reroute, packet loss ends.

**Provider Action Taken
**When the undesirable path was identified action was taken on a device in the chain. On this device we forced a further reroute of traffic to an alternative path, ending the period of packet loss.

*Root Cause Analysis
*
Off the back of the event, MPLS rerouted traffic away from the backhaul link onto alternative paths. MPLS calculated a path using links that are costed due to low capacity even though multiple high-capacity paths were available. We are currently investigating the root cause of this.

*Mitigating Actions
*
The provider has a case open with Juniper on this to investigate a root cause of costings that were not taken into account during this recalculation event. Improvements to proactive monitoring of packet loss is now due to be undertaken to allow the provider to better identify these issues.

Posted Feb 24, 2021 - 13:31 GMT

Resolved
We started to receive reports from customers today from around 4pm and resolved by 16:49pm with one-way audio or bad call quality.

One of our External Connectivity Providers had an issue within their core network which meant some customers where traffic reaches our network via this particular Internet Route, suffered packet loss.

BGP Sessions remained active to the External Connectivity Provider (IP Transit Provider) during the packet loss as packet loss was high enough to cause issues with Voice but not long enough for BGP to pick up on the issue and automatically take down the BGP Session and re-route traffic.

We have requested from the affected provider a Reason for Outage Report "RFO" which I have been told will be with us in 24 hours.

We will investigate if our monitoring system can be setup to monitor for small packet loss that would not cause the service to completely fail but nevertheless cause issues with Voice Calls and not show up on our monitoring.

We will provide a copy of the report once received and an update on what we can do to improve network monitoring.
Posted Feb 23, 2021 - 17:22 GMT
Update
We have been informed the external issue is resolved and waiting for a statement to be released so we can have a clear understanding of what caused the issue with the calls. We will provide a reason for the issues shorlty.
Posted Feb 23, 2021 - 16:53 GMT
Update
It would now appear the issue is occurring again, we are still investigating.
Posted Feb 23, 2021 - 16:35 GMT
Monitoring
It would seem the problem has cleared on its own. We will attempt to find out the cause and provide a further update shortly.

Again nothing is showing on our monitoring systems.
Posted Feb 23, 2021 - 16:24 GMT
Update
We have confirmed the issue is not on our side of the network. Other phone providers are also having the same issues. We are trying to find out the cause still. Update to follow.
Posted Feb 23, 2021 - 16:10 GMT
Investigating
We have received a few reports from customers that they are experiencing poor quality audio or one-way audio.

None of our monitoring is showing any issues.

We are investigating to see what is causing it.

Update to follow.
Posted Feb 23, 2021 - 15:51 GMT