Running a BGP Free Core – Quick and dirty

December 31, 2015 - By Jon Major

So it’s the end of the year, and I wanted to do a quick video/post to round of the blog for 2015. A concept that’s relatively new to me (sadly) is running a BGP free core, and it’s relatively easy to set up. That being said… I know it’s kind of a lazy post. Hopefully one that will spark some interesting conversation and inspire new posts in 2016!!

That being said, let’s talk about the the concept, and the problem it’s trying to address. BGP tables for IPv4 are $@#%@ massive. Over 500,000 prefixes which, obviously, is a lot. Traditional networking would call for every device in your core to have these routes (or summaries of them) in order to ensure reachability. However, what if I told you there was a way to only have edge routers store these massive BGP tables and to keep the core running light and efficient?!

Of course you want to know. MPLS. *Mic drop*. With MPLS the core doesn’t have to know anything about the outside world, all it needs is labels for reaching edge routers. But why?! So I have a working environment pictured here:

Ok, to keep things simple I shutdown P3 and P4. Let’s see what happens when R1 tries to send traffic to R2’s loopback 0 interface.

R1#show ip cef 172.16.2.2
172.16.2.2/32
nexthop 172.17.111.11 GigabitEthernet0/1
R1#show ip route 172.16.2.2
Routing entry for 172.16.2.2/32
Known via “bgp 1”, distance 20, metric 0
Tag 1122, type external
Last update from 172.17.111.11 20:55:28 ago
Routing Descriptor Blocks:
* 172.17.111.11, from 172.17.111.11, 20:55:28 ago
Route metric is 0, traffic share count is 1
AS Hops 2
Route tag 1122
MPLS label: none

Alright, nothing fancy yet, we see a next of of 172.17.111.11 (PE1’s Gi0/3). Moving right along, let’s go over to PE1 and see what’s happening.

PE1#show ip cef 172.16.2.2
172.16.2.2/32
nexthop 10.1.11.1 GigabitEthernet0/1 label 26
PE1#show ip route 172.16.2.2
Routing entry for 172.16.2.2/32
Known via “bgp 1122”, distance 200, metric 0
Tag 2, type internal
Last update from 10.22.22.22 20:53:59 ago
Routing Descriptor Blocks:
* 10.22.22.22, from 10.22.22.22, 20:53:59 ago
Route metric is 0, traffic share count is 1
AS Hops 1
Route tag 2
MPLS label: none

Now things are getting a bit more interesting, checkout out our ‘show ip cef’ output. Label 26… put a pin in that, we’re coming right back to it. Ok, so we have a next hop of 10.22.22.22 (loopback 0 of PE2), but cef shows that we’re forwarding this traffic out Gi0/1 towards P1. Let’s go to P1 and issue the same commands (show ip cef and show ip route for 172,16.2.2).

P1#show ip cef 172.16.2.2
0.0.0.0/0
no route
P1#show ip route 172.16.2.2
% Network not in table

You’re reading that right, no route. I can smell your fear from my monitor. Don’t panic, allow me to explain. Go back to the output from PE1. remember that label 26? Let’s follow the label brick road, and see where that takes us.

PE1#show mpls forwarding-table | i 26
28 26 10.22.22.22/32 0 Gi0/1 10.1.11.1
P1#show mpls forwarding-table | i 26
26 26 10.22.22.22/32 9772 Gi0/1 10.0.12.2
P2#show mpls forwarding-table | i 26
26 Pop Label 10.22.22.22/32 73 Gi0/4 10.2.22.22

Alright, so PE1 has an LSP (label switched path) to reach the loopback of PE2. For brevity, just take my word that the reverse is true. On the second to the last label hop, per MPLS standard, P2 performs a pop operation and forwards the traffic on to PE2. Alright, to put a bow on this thing let’s tie together the final peices of the pie, by answering a question I feel like some of you must be asking yourselves. Why does it matter whether or not PEs have an LSP between their loopbacks? For that matter, how is it remotely related to R1 and R2 communication? To answer that, first let’s look at the show bgp ipv4 unicast output and a show run | sec router bgp on PE1.

PE1#show bgp ipv4 unicast | beg Network
Network Next Hop Metric LocPrf Weight Path
*> 172.16.1.1/32 172.17.111.111 0 0 1 i
*>i 172.16.2.2/32 10.22.22.22 0 100 0 2 i
PE1#show run | sec router bgp
router bgp 1122
bgp log-neighbor-changes
neighbor 10.22.22.22 remote-as 1122
neighbor 10.22.22.22 update-source Loopback0
neighbor 10.22.22.22 next-hop-self
neighbor 172.17.111.111 remote-as 1

Very similar output can be observed on PE2. The magic that’s happening here is PE1 knows to reach 172.16.2.2 via 10.22.22.22 (loopback0 on PE2). PE1 also knows that it has an outgoing label of 26 to reach 10.22.22.22. So instead of just forwarding traffic to 172.16.2.2, or requiring a label specifically for that prefix, it just encapsulates the traffic MPLS with a label for PE2’s loopback 0 interface. Effectively tunneling this communication through the core network to PE2. So in short… magic. Finally, as promise, so wireshark output from R1 pinging R2.

So what you’ll see in the following screenshots is

1. R1 forwards traffic PE1, stock standard IP.
2. PE1 forwards traffic to P1 using label 26 (to reach PE2).
3. P1 forwards traffic to P2 also using label 26 (just a coincidence they share the same label for 10.22.22.22)
4. P2 does a pop operation and forwards the original IP packet from R1 to PE2 with no label. Then obviously PE2 forwards the ICMP packet to R2.

Link from R1<>PE1