BGP Troubleshooting with Cisco Routers – Determining Upstream ISP Fault

Sporadic BGP issues can be difficult to troubleshoot sometimes, especially when the routing issues you face are further upstream (ISP controlled) and out of your control. Finding the root cause can be a problem. Part of the reason this is difficult is because Cisco Routers only log BGP neighbor state changes by default, and not “route update” or “route withdrawal messages”.

Consider the following network and issues below:

Customer Site 1:

R1 – AS 65505 – 10.17.184.0/24

R2 (ISP CPE)

Customer Site 2:

R4 – AS 65502 – 10.23.28.0/24
R5 (ISP CPE)

ISP PE:
R3 – AS 2202

Realtime traffic is dropping somewhere between R1 and R4 – Evidenced by IPSEC Tunnel down issues, and VOIP call drops. Drops are at random intervals. After looking into all other avenues of troubleshooting (Link Saturation, Internal Network Issues, Latency, Etc), looking into the upstream routing is the next step.

  • First we check the neighbor state between R1 – R2

 

  • The BGP session has been established between R1 and R2 for over a year, however that doesn’t tell us about any recent route changes.

 

Next let’s take a look at the route for the site we are interested in (10.17.160.0)

 

 

  • The last update for that route (Update being a withdrawal or advertisement) is 5 days ago. This indicates something happened upstream. Looking at the rest of the routing table confirms it.

 

  • Reading this table, we can see that something happened 5 days ago to withdrawal/update all routes from our ISP peer (R3). The peer withdrew all the routes, since they clearly had a issue upstream within their service cloud. Unfortunate this is very easy to miss if you don’t read carefully.

 

This is a very basic way to troubleshoot BGP route changes when you have no other logging available. Recommendations to avoid this issue are to run "debug ip bgp update" on your edge routers when seeing events problems like this. Doing so will give you DEBUG level events in your console and syslog that will spell out the issue for you to pass on to the problem peer. The output of that command will look like this during a update on your peer  router –

 



 

Reference:

 

http://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/22166-b-trouble-main.html

Leave a Comment

Your email address will not be published. Required fields are marked *