At 14:12 on 04/04/2018 one of our carriers experienced a failure of a network switch "External to I.T Communications Network" caused routing issues on their core network. Another switch was not configured correctly and this led to a network loop being present which was not detected and managed by the devices.
The routing issues caused failover between primary and secondary systems but both were experiencing networking connectivity issues and so failed to take over responsibility for call control, leaving new call attempts failing to be authorised.
The carrier runs multiple database and call control systems which are hosted on physically separate infrastructure and storage for resilience, however both are connected into their core network to allow communication with switching units, media gateways, customer and other IP connections.
Once it was determined the location of the problem, manual intervention was required to reset and resynchronise the primary/secondary systems and allow them to authorise calls.
The carrier has started an immediate review of the networking infrastructure to confirm the best way to isolate the secondary resilient and failover systems from any future issues and expect to make gradual and planned changes over the coming days/weeks to implement the changes required.
This outage did not just affect I.T Communications but many other VoIP Providers. it is estimated that well over a million customers experienced issues due to this issue. Inbound calls where not even reaching our network.
Please accept our apologies for the disruption this caused.