Saturday, April 4, 2015

When Even MacGyver Would be Proud

As promised, here is the follow-up to Thursday's teaser.

In this scenario, we're confined to working from our edge router and the 3550 that hands off to our two ISPs, and we aren't authorized to outright rebuild the topology.

We're restricted to two interfaces on our router, so we had to bite the bottleneck and build subinterfaces on the inside and outside interfaces for our respective production VLANs, and two new VLANs, 10 and 20, to get traffic out to ISP1 and 2 respectively. I'll post the non-final config so you can see what we're working with to move traffic from our 172.20.50.0/23 to the remote Server-Farm. Keep in mind that our Tunnel0 crosses MPLS via ISP1, so no need to run IPsec on top, but we're crossing open Internet via ISP2 and Server-Farm Corp can't pass GRE through their firewall, so we have to build an L2L IPsec VPN.

ip sla monitor 1
 type echo protocol ipIcmpEcho 11.11.11.1 source-ipaddr 11.11.11.2
 frequency 2
ip sla monitor schedule 1 life forever start-time now
!
!
ip tcp synwait-time 5
!
track 10 rtr 1
 delay down 6 up 6
!
!
crypto isakmp policy 1
 encr aes 256
 authentication pre-share
 group 5
crypto isakmp key thisseemssecure address 22.22.23.2
!
!
crypto ipsec transform-set HHG2N-to-ServerFarm esp-aes 256 esp-sha-hmac
!
crypto map HHG2N-to-ServerFarm-Map 1 ipsec-isakmp
 set peer 22.22.23.2
 set transform-set HHG2N-to-ServerFarm
 match address 105
!
!
interface Tunnel0
 description HHG2N->ServerFarm-via-ISP1
 ip address 172.20.20.2 255.255.255.252
 tunnel source FastEthernet0/0.10
 tunnel destination 11.11.12.2
!
interface FastEthernet0/0
 description Fa0/0-to-DC-Edge-3550-Eth0/3
 no ip address
 no ip redirects
 no ip proxy-arp
 duplex auto
 speed auto
 no clns route-cache
!
interface FastEthernet0/0.10
 description VLAN-10-Handoff-to-ISP1
 encapsulation dot1Q 10
 ip address 11.11.11.2 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip nat outside
 ip virtual-reassembly
!
interface FastEthernet0/0.20
 description VLAN-20-Handoff-to-ISP2
 encapsulation dot1Q 20
 ip address 22.22.22.2 255.255.255.248
 no ip redirects
 no ip proxy-arp
 ip nat outside
 ip virtual-reassembly
 crypto map HHG2N-to-ServerFarm-Map
!
!
interface FastEthernet0/1
 description LAN-Gateway
 no ip address
 ip virtual-reassembly
 duplex auto
 speed auto
!
interface FastEthernet0/1.50
 description Production-LAN-Gateway
 encapsulation dot1Q 50
 ip address 172.20.50.254 255.255.254.0
 ip nat inside
 ip virtual-reassembly
!
!
ip route 0.0.0.0 0.0.0.0 11.11.11.1 track 10
ip route 172.30.0.0 255.255.248.0 Tunnel0 track 10
ip route 0.0.0.0 0.0.0.0 22.22.22.1 2
ip route 172.30.0.0 255.255.248.0 FastEthernet0/0.20 2
!
ip nat inside source route-map NATtoISP1 interface FastEthernet0/0.10 overload
ip nat inside source route-map NATtoISP2 interface FastEthernet0/0.20 overload
!
access-list 10 permit 172.20.50.0 0.0.1.255
access-list 105 permit ip 172.20.50.0 0.0.1.255 172.30.0.0 0.0.7.255
access-list 110 deny   ip 172.20.50.0 0.0.1.255 172.30.0.0 0.0.7.255
access-list 110 permit ip 172.20.50.0 0.0.1.255 any
no cdp log mismatch duplex
!
route-map NATtoISP2 permit 10
 match ip address 110
 match interface FastEthernet0/0.20
!
route-map NATtoISP1 permit 10
 match ip address 10
 match interface FastEthernet0/0.10

This is all well and good and our production traffic properly crosses GRE Tunnel0 to Server-Farm Corp via ISP1's MPLS
testhost#traceroute 172.30.1.61
Type escape sequence to abort.
Tracing the route to 172.30.1.61
VRF info: (vrf in name/id, vrf out name/id)
  1 172.20.50.254 27 msec 18 msec 9 msec
  2 172.20.20.1 59 msec 40 msec 39 msec
  3 172.30.1.61 40 msec 40 msec 40 msec
but what happens when we failover to ISP2?
testhost#ping 172.30.1.61
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.30.1.61, timeout is 2 seconds:
!.!.!
Success rate is 60 percent (3/5), round-trip min/avg/max = 88/205/437 ms
That can't be good...Perfectly symmetrical packet loss never solved anything! We know our L2L IPsec VPN is good because packets are getting there, but what's becoming of the missing ones. Let's see if we're losing at our gateway on our router or at our outside interface.
testhost#ping 172.20.50.254
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.20.50.254, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 10/20/31 ms
testhost#ping 22.22.22.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 22.22.22.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 10/20/33 ms
Hmm, alright. So, it's not a problem with either of our interfaces. Let's check the configs to make sure that we're exempting our IPsec traffic from NAT.
ip nat inside source route-map NATtoISP2 interface FastEthernet0/0.20 overload
!
access-list 110 deny   ip 172.20.50.0 0.0.1.255 172.30.0.0 0.0.7.255
access-list 110 permit ip 172.20.50.0 0.0.1.255 any
!
route-map NATtoISP2 permit 20
 match ip address 110
 match interface FastEthernet0/0.20
No problem there. So where are our packets going?! Let's toss a "log" on the 110 ACL's deny entry and send some more pings.
router#do sh access-list 110
Extended IP access list 110
    10 deny ip 172.20.50.0 0.0.1.255 172.30.0.0 0.0.7.255 log (19 matches)
    20 permit ip 172.20.50.0 0.0.1.255 any
testhost#ping 172.30.1.61
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.30.1.61, timeout is 2 seconds:
!.!.!
Success rate is 60 percent (3/5), round-trip min/avg/max = 80/95/117 ms
router#sh access-list 110
Extended IP access list 110
    10 deny ip 172.20.50.0 0.0.1.255 172.30.0 0.0.7.255 log (22 matches)
    20 permit ip 172.20.50.0 0.0.1.255 any
That's interesting. Our bypass entry is only incrementing by 3 matches, which is the exact number of successful packets we have.  It looks like some of our packets are being evaluated against the PAT route-map and are being dropped instead of matching the crypto map and egressing our L2L IPsec VPN.

This is where we end up strapping on our MacGyver gloves and hitting the CLI. We need to find a way to bypass this NAT issue without breaking our currently-functional configuration when we're not failed over to ISP2. I'll save you the extensive contemplation that ensued when I originally encountered this and give you the solution:

We need to setup PBR on our fa0/1.50 LAN gateway to match traffic destined for the Server-Farm. We'll end up telling the router "Hey, if traffic sourced from 172.20.50.0/23 is headed to 172.30.0.0/21, bypass this NAT issue and set the interface as Fa0/0.20." Keep in mind, though, that by doing this we'll keep sending traffic out ISP2 even if we're not failed over. So we need to do "set interface Tunnel0 Fa0/0.20". The PBR will only use Tunnel0 if it's up (so, we'll also need to add GRE keepalives), otherwise opting for Fa0/0.20.  Production traffic destined for the Internet will be unaffected. By default, if traffic doesn't match a PBR it is routed normally.
route-map BypassNAT permit 10
match ip address 105
set interface Tunnel0 FastEthernet0/0.20
interface Tunnel0
keepalive 10 1    <== this actually makes our track 10 statement pointless, but oh well.
interface FastEthernet0/1.50
ip policy route-map BypassNAT
So, our keepalives should down Tunnel0 when it loses connectivity to the far end public IP, and our route-map won't use it in the "set interface" if it's not up. Ok, let's test it out!
testhost#ping 172.30.1.61
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.1.61, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 70/82/93 ms
testhost#traceroute 172.30.1.61
Type escape sequence to abort.
Tracing the route to 172.30.1.61
VRF info: (vrf in name/id, vrf out name/id)
  1 172.20.50.254 36 msec 20 msec 10 msec
  2 172.30.1.61 80 msec 80 msec 80 msec
Good news, everybody! It worked! So, as you can see, by putting our traffic through a PBR with a set interface, we are able to bypass the "ip nat inside" on Fa0/1.50 which evaluates our traffic against our PAT statement and appeared to be dropping some of our packets. Could all of this have been avoided by moving to a more scalable and appropriate design? Almost certainly. But we don't always have that luxury with some clients, so we have to do what we can with what we've got.

Thanks for sticking with it, guys/girls!

Thursday, April 2, 2015

The Client is Always Right

Hey there, fellow packet-jockeys!

Sorry about the lapse in posts.  I recently agreed to help out with a tight-budget, grant-funded project for the organization that gave me my first gig as a networker. So, along-side my normal 9 to 5, I'm consulting one to two days a week. Yup. I smile fondly at the reminiscence of what a full weekend once felt like.

Now, while it feels great to be helping out a non-profit in its time of need, it can be tough working within the (sometimes nonexistent) budget and only utilizing the existing hardware. Unfortunately, that's the way it goes sometimes, and I'm sure many of us have had a client where we just had to bite the bullet and work with what we've got.

Okay...Maybe we should narrow the acceptable spectrum of "what we've got."

Luckily for me, this has been a fantastic learning experience for making the best of a tough topology, putting your nose (or, you know, fingers...) to the CLI and making it happen. I'll post the full, sanitized, rundown of this interesting implementation either tomorrow or Saturday, but I'll give you a teaser:


You have a 3550 on your edge which hands off to two ISPs, one of which you just installed for IP SLA failover. Off of the 3550 you have an 1841 ISR which, lucky for you, does your PAT, terminates one end of a GRE tunnel that runs over MPLS via your primary ISP to a 3rd party server farm, and now also has to terminate a backup L2L IPsec VPN across your second ISP to said server farm's second ISP hand-off.

Here's the kicker: Your 1841 only has two interfaces, Fa0/0 (outside) and Fa0/1 (inside), and the downtime of rebuilding the edge from scratch is unacceptable.