BGP Issues
Incident Report for I.T Communications Limited
Postmortem

at around 12:00 am on 4th December 2024, our monitoring system alerted us to over 60 BGP sessions keep going up and down (Flapping)

Our on-call network engineer Phil was then alerted and found that the BGP Daemon keep crashing and that is what caused the BGP sessions to keep flapping.

A remote reboot of Sov Router did not resolve the issue and after investigating further, we found this to be likely a remote BGP DDOS Attack, which took advantage of a vulnerability in the routing protocol daemon (RPD) of Juniper Networks Junos OS (Router)

Juniper released a fix for this which Phil attended both Data Centres to apply the software update. This could not be done remotely due to the BGP Daemon not being stable and a fear of software updating failing. So had to be done via a USB Drive to load on the software.

Phil arrived at the first Data Centre at 1:53am of 4th December 2024 and attempted to load the new firmware. Following the update a few manual reboots was needed for the new software to fully apply.

At 3:50am of 4th December 2024, the first Router then become stable again following the completed software upgrade.

At 3:58 of 4th December 2024, we noticed the other router had also been stable for around 3 hours.

Phil thought it best to also upgrade the software on the second router of which started at 4:32am of 4th December 2024.

At 4:57am of 4th December 2024, the software had been fully upgraded and router rebooted and all the remaining BGP Sessions came up and no further alerts showing on our monitoring.

Both upgrades fixed the vulnerability in the Juniper routing protocol daemon (RPD) and our network returned back to being stable.

whilst it took from around 12:00am to 4:57am to fully resolve all the issues, the whole network would not have been completely down for this amount of time.

Affected Services

Hosted 3CX at Volta - around 3 Hours and 50 minutes
Hosted 3CX at Sov House, few small outages but largely remained up and in service.

SIP Trunks, Some customers would have been offline, depending on where the Customer 3CX was hosted,

Internal network remained stable and fully functional during the Router issues,

Posted Dec 04, 2024 - 11:38 GMT

Resolved
last Router now backup and issue should now be fully resolved.
Posted Dec 04, 2024 - 04:57 GMT
Update
The new software is being loaded onto the second router now and will require a few reboots.
Posted Dec 04, 2024 - 04:32 GMT
Update
The second router is also stable at the moment, However due to the BGP DDOS, we are going to do the upgrade now.

We shall be starting the work shortly after 4:30am,
Posted Dec 04, 2024 - 03:58 GMT
Update
Router is now stable.
Posted Dec 04, 2024 - 03:50 GMT
Update
Juniper Networks Junos OS upgrade finished and doing final reboots on 1st Router.

Second Router can be upgraded from within the network so no need for Phil to go to second Data Centre.

Network at Volta should start to become stable again very shortly.
Posted Dec 04, 2024 - 03:29 GMT
Update
Phil has now arrived onsite and will proceed to upgrading the Router Software, once updated and router is stable, Phil will attend second data centre and upgrade the software on the other Juniper MX Router.

Update to follow.
Posted Dec 04, 2024 - 01:53 GMT
Update
Phil is expected to arrive on-site at 1:39am
Posted Dec 04, 2024 - 01:24 GMT
Identified
Possible Cause of issue

An Improper Handling of Exceptional Conditions vulnerability in the routing protocol daemon (RPD) of Juniper Networks Junos OS

When a malformed BGP UPDATE packet is received over an established BGP session, RPD crashes and restarts.

Continuous receipt of the malformed BGP UPDATE messages will create a sustained Denial of Service (DoS) condition for impacted devices.

It would seem someone remotely is trying to cause a Denial of Service (DoS) attack on part of our network.

Phil is now on his way to Sov House to load a software update on the affected router to hopefully resolve the issue.
Posted Dec 04, 2024 - 01:14 GMT
Update
We continue to investigate and are taking steps to resolve the issue.
Posted Dec 04, 2024 - 00:47 GMT
Update
The Router has rebooted, however the BGP sessions are still having issues.

We are investigating to find out the cause.
Posted Dec 04, 2024 - 00:34 GMT
Investigating
We are currently aware of a issue with BGP Flapping which is causing parts of our network to be unreachable.

We are rebooting the affected Router to clear the fault.
Posted Dec 04, 2024 - 00:27 GMT