Saturday, April 4, 2015

When Even MacGyver Would be Proud

As promised, here is the follow-up to Thursday's teaser.

In this scenario, we're confined to working from our edge router and the 3550 that hands off to our two ISPs, and we aren't authorized to outright rebuild the topology.

We're restricted to two interfaces on our router, so we had to bite the bottleneck and build subinterfaces on the inside and outside interfaces for our respective production VLANs, and two new VLANs, 10 and 20, to get traffic out to ISP1 and 2 respectively. I'll post the non-final config so you can see what we're working with to move traffic from our 172.20.50.0/23 to the remote Server-Farm. Keep in mind that our Tunnel0 crosses MPLS via ISP1, so no need to run IPsec on top, but we're crossing open Internet via ISP2 and Server-Farm Corp can't pass GRE through their firewall, so we have to build an L2L IPsec VPN.

ip sla monitor 1
 type echo protocol ipIcmpEcho 11.11.11.1 source-ipaddr 11.11.11.2
 frequency 2
ip sla monitor schedule 1 life forever start-time now
!
!
ip tcp synwait-time 5
!
track 10 rtr 1
 delay down 6 up 6
!
!
crypto isakmp policy 1
 encr aes 256
 authentication pre-share
 group 5
crypto isakmp key thisseemssecure address 22.22.23.2
!
!
crypto ipsec transform-set HHG2N-to-ServerFarm esp-aes 256 esp-sha-hmac
!
crypto map HHG2N-to-ServerFarm-Map 1 ipsec-isakmp
 set peer 22.22.23.2
 set transform-set HHG2N-to-ServerFarm
 match address 105
!
!
interface Tunnel0
 description HHG2N->ServerFarm-via-ISP1
 ip address 172.20.20.2 255.255.255.252
 tunnel source FastEthernet0/0.10
 tunnel destination 11.11.12.2
!
interface FastEthernet0/0
 description Fa0/0-to-DC-Edge-3550-Eth0/3
 no ip address
 no ip redirects
 no ip proxy-arp
 duplex auto
 speed auto
 no clns route-cache
!
interface FastEthernet0/0.10
 description VLAN-10-Handoff-to-ISP1
 encapsulation dot1Q 10
 ip address 11.11.11.2 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip nat outside
 ip virtual-reassembly
!
interface FastEthernet0/0.20
 description VLAN-20-Handoff-to-ISP2
 encapsulation dot1Q 20
 ip address 22.22.22.2 255.255.255.248
 no ip redirects
 no ip proxy-arp
 ip nat outside
 ip virtual-reassembly
 crypto map HHG2N-to-ServerFarm-Map
!
!
interface FastEthernet0/1
 description LAN-Gateway
 no ip address
 ip virtual-reassembly
 duplex auto
 speed auto
!
interface FastEthernet0/1.50
 description Production-LAN-Gateway
 encapsulation dot1Q 50
 ip address 172.20.50.254 255.255.254.0
 ip nat inside
 ip virtual-reassembly
!
!
ip route 0.0.0.0 0.0.0.0 11.11.11.1 track 10
ip route 172.30.0.0 255.255.248.0 Tunnel0 track 10
ip route 0.0.0.0 0.0.0.0 22.22.22.1 2
ip route 172.30.0.0 255.255.248.0 FastEthernet0/0.20 2
!
ip nat inside source route-map NATtoISP1 interface FastEthernet0/0.10 overload
ip nat inside source route-map NATtoISP2 interface FastEthernet0/0.20 overload
!
access-list 10 permit 172.20.50.0 0.0.1.255
access-list 105 permit ip 172.20.50.0 0.0.1.255 172.30.0.0 0.0.7.255
access-list 110 deny   ip 172.20.50.0 0.0.1.255 172.30.0.0 0.0.7.255
access-list 110 permit ip 172.20.50.0 0.0.1.255 any
no cdp log mismatch duplex
!
route-map NATtoISP2 permit 10
 match ip address 110
 match interface FastEthernet0/0.20
!
route-map NATtoISP1 permit 10
 match ip address 10
 match interface FastEthernet0/0.10

This is all well and good and our production traffic properly crosses GRE Tunnel0 to Server-Farm Corp via ISP1's MPLS
testhost#traceroute 172.30.1.61
Type escape sequence to abort.
Tracing the route to 172.30.1.61
VRF info: (vrf in name/id, vrf out name/id)
  1 172.20.50.254 27 msec 18 msec 9 msec
  2 172.20.20.1 59 msec 40 msec 39 msec
  3 172.30.1.61 40 msec 40 msec 40 msec
but what happens when we failover to ISP2?
testhost#ping 172.30.1.61
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.30.1.61, timeout is 2 seconds:
!.!.!
Success rate is 60 percent (3/5), round-trip min/avg/max = 88/205/437 ms
That can't be good...Perfectly symmetrical packet loss never solved anything! We know our L2L IPsec VPN is good because packets are getting there, but what's becoming of the missing ones. Let's see if we're losing at our gateway on our router or at our outside interface.
testhost#ping 172.20.50.254
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.20.50.254, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 10/20/31 ms
testhost#ping 22.22.22.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 22.22.22.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 10/20/33 ms
Hmm, alright. So, it's not a problem with either of our interfaces. Let's check the configs to make sure that we're exempting our IPsec traffic from NAT.
ip nat inside source route-map NATtoISP2 interface FastEthernet0/0.20 overload
!
access-list 110 deny   ip 172.20.50.0 0.0.1.255 172.30.0.0 0.0.7.255
access-list 110 permit ip 172.20.50.0 0.0.1.255 any
!
route-map NATtoISP2 permit 20
 match ip address 110
 match interface FastEthernet0/0.20
No problem there. So where are our packets going?! Let's toss a "log" on the 110 ACL's deny entry and send some more pings.
router#do sh access-list 110
Extended IP access list 110
    10 deny ip 172.20.50.0 0.0.1.255 172.30.0.0 0.0.7.255 log (19 matches)
    20 permit ip 172.20.50.0 0.0.1.255 any
testhost#ping 172.30.1.61
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.30.1.61, timeout is 2 seconds:
!.!.!
Success rate is 60 percent (3/5), round-trip min/avg/max = 80/95/117 ms
router#sh access-list 110
Extended IP access list 110
    10 deny ip 172.20.50.0 0.0.1.255 172.30.0 0.0.7.255 log (22 matches)
    20 permit ip 172.20.50.0 0.0.1.255 any
That's interesting. Our bypass entry is only incrementing by 3 matches, which is the exact number of successful packets we have.  It looks like some of our packets are being evaluated against the PAT route-map and are being dropped instead of matching the crypto map and egressing our L2L IPsec VPN.

This is where we end up strapping on our MacGyver gloves and hitting the CLI. We need to find a way to bypass this NAT issue without breaking our currently-functional configuration when we're not failed over to ISP2. I'll save you the extensive contemplation that ensued when I originally encountered this and give you the solution:

We need to setup PBR on our fa0/1.50 LAN gateway to match traffic destined for the Server-Farm. We'll end up telling the router "Hey, if traffic sourced from 172.20.50.0/23 is headed to 172.30.0.0/21, bypass this NAT issue and set the interface as Fa0/0.20." Keep in mind, though, that by doing this we'll keep sending traffic out ISP2 even if we're not failed over. So we need to do "set interface Tunnel0 Fa0/0.20". The PBR will only use Tunnel0 if it's up (so, we'll also need to add GRE keepalives), otherwise opting for Fa0/0.20.  Production traffic destined for the Internet will be unaffected. By default, if traffic doesn't match a PBR it is routed normally.
route-map BypassNAT permit 10
match ip address 105
set interface Tunnel0 FastEthernet0/0.20
interface Tunnel0
keepalive 10 1    <== this actually makes our track 10 statement pointless, but oh well.
interface FastEthernet0/1.50
ip policy route-map BypassNAT
So, our keepalives should down Tunnel0 when it loses connectivity to the far end public IP, and our route-map won't use it in the "set interface" if it's not up. Ok, let's test it out!
testhost#ping 172.30.1.61
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.1.61, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 70/82/93 ms
testhost#traceroute 172.30.1.61
Type escape sequence to abort.
Tracing the route to 172.30.1.61
VRF info: (vrf in name/id, vrf out name/id)
  1 172.20.50.254 36 msec 20 msec 10 msec
  2 172.30.1.61 80 msec 80 msec 80 msec
Good news, everybody! It worked! So, as you can see, by putting our traffic through a PBR with a set interface, we are able to bypass the "ip nat inside" on Fa0/1.50 which evaluates our traffic against our PAT statement and appeared to be dropping some of our packets. Could all of this have been avoided by moving to a more scalable and appropriate design? Almost certainly. But we don't always have that luxury with some clients, so we have to do what we can with what we've got.

Thanks for sticking with it, guys/girls!

Thursday, April 2, 2015

The Client is Always Right

Hey there, fellow packet-jockeys!

Sorry about the lapse in posts.  I recently agreed to help out with a tight-budget, grant-funded project for the organization that gave me my first gig as a networker. So, along-side my normal 9 to 5, I'm consulting one to two days a week. Yup. I smile fondly at the reminiscence of what a full weekend once felt like.

Now, while it feels great to be helping out a non-profit in its time of need, it can be tough working within the (sometimes nonexistent) budget and only utilizing the existing hardware. Unfortunately, that's the way it goes sometimes, and I'm sure many of us have had a client where we just had to bite the bullet and work with what we've got.

Okay...Maybe we should narrow the acceptable spectrum of "what we've got."

Luckily for me, this has been a fantastic learning experience for making the best of a tough topology, putting your nose (or, you know, fingers...) to the CLI and making it happen. I'll post the full, sanitized, rundown of this interesting implementation either tomorrow or Saturday, but I'll give you a teaser:


You have a 3550 on your edge which hands off to two ISPs, one of which you just installed for IP SLA failover. Off of the 3550 you have an 1841 ISR which, lucky for you, does your PAT, terminates one end of a GRE tunnel that runs over MPLS via your primary ISP to a 3rd party server farm, and now also has to terminate a backup L2L IPsec VPN across your second ISP to said server farm's second ISP hand-off.

Here's the kicker: Your 1841 only has two interfaces, Fa0/0 (outside) and Fa0/1 (inside), and the downtime of rebuilding the edge from scratch is unacceptable.

Saturday, March 7, 2015

OSPF and Equal Cost Path Selection

I've done my best to trim down the excess complexities of this topology.  As it stands, the scenario is this:
There are multiple P2P MPLS client networks that hang off of South-Client-Handoff and North-Core which must pass all of their up/downstream traffic through MPLS-Handoff.  There are also non-P2P MPLS client networks whose up/downstream traffic must pass through Edge via VLAN 15.


One of the big take-home points here is that we cannot trunk VLAN 24 between South-Core and Edge. We could create an SVI for VLAN 15 (let's say, 192.168.15.6/28) on South-Client Handoff and put it into OSPF Area 0, as seen below:
Pre-VLAN 15 SVI addition to South-Client-Handoff:
----------------------------------------------
North-Core#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 192.168.24.4 on Vlan24, 00:08:54 ago
  Routing Descriptor Blocks:
  * 192.168.24.4, from 4.4.4.4, 00:11:17 ago, via Vlan24
      Route metric is 2, traffic share count is 1
Edge#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 192.168.15.3 on Vlan15, 00:00:12 ago
  Routing Descriptor Blocks:
  * 192.168.15.4, from 4.4.4.4, 00:00:12 ago, via Vlan15
      Route metric is 3, traffic share count is 1
    192.168.15.3, from 4.4.4.4, 00:00:12 ago, via Vlan15
      Route metric is 3, traffic share count is 1
Post-VLAN 15 SVI addition to South-Client-Handoff:
-----------------------------------------------
North-Core#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 192.168.15.6 on Vlan15, 00:00:06 ago
  Routing Descriptor Blocks:
  * 192.168.24.4, from 4.4.4.4, 00:02:29 ago, via Vlan24
      Route metric is 2, traffic share count is 1
    192.168.15.6, from 4.4.4.4, 00:00:06 ago, via Vlan15
      Route metric is 2, traffic share count is 1
Edge#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 192.168.15.6 on Vlan15, 00:00:34 ago
  Routing Descriptor Blocks:
  * 192.168.15.6, from 4.4.4.4, 00:00:34 ago, via Vlan15
      Route metric is 2, traffic share count is 1
But, then our P2P MPLS Client Networks will get the default route out through Edge on VLAN 15 unless we get creative with a distribution-list to filter them away from VLAN 15 and keep them routing in/out of VLAN 24 through our MPLS-Handoff. At that point, it becomes less of a hassle for us to just keep the odd switched path through North-Core that I'll be covering below.

With OSPF max-paths enabled for multi-path traffic forwarding, downstream traffic for non-P2P MPLS clients hanging off of South-Client-Handoff is split switched either to SVI 15 (192.168.15.4) on South-Core or SVI 15 (192.168.15.3) on North-Core, then routed onto VLAN 24 to go to the South-Client-Handoff:
Edge#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 192.168.15.3 on Vlan15, 00:01:26 ago
  Routing Descriptor Blocks:
  * 192.168.15.4, from 4.4.4.4, 00:01:26 ago, via Vlan15
      Route metric is 3, traffic share count is 1
    192.168.15.3, from 4.4.4.4, 00:01:26 ago, via Vlan15
      Route metric is 3, traffic share count is 1
Edge#traceroute 10.17.12.1
Type escape sequence to abort.
Tracing the route to 10.17.12.1
VRF info: (vrf in name/id, vrf out name/id)
  1 192.168.15.3 5 msec
    192.168.15.4 4 msec
    192.168.15.3 4 msec
  2 192.168.24.4 5 msec 6 msec 4 msec
This is because the non-P2P MPLS clients' assigned public ranges are advertised in OSPF by South-Client-Handoff via VLAN 24, as seen here:
South-Core#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 192.168.24.4 on Vlan24, 00:34:47 ago
  Routing Descriptor Blocks:
  * 192.168.24.4, from 4.4.4.4, 00:34:47 ago, via Vlan24
      Route metric is 2, traffic share count is 1 
North-Core#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 2, type intra area
  Last update from 192.168.24.4 on Vlan24, 00:10:03 ago
  Routing Descriptor Blocks:
  * 192.168.24.4, from 4.4.4.4, 00:10:03 ago, via Vlan24
      Route metric is 2, traffic share count is 1
While this isn't horrible, it's certainly a waste of bandwidth, as downstream traffic for our non-P2P clients is, in part, switched out e0/3 of South-Core, onto North-Core, then is routed off VLAN 15 onto VLAN 24 and egresses e0/1 back onto South-Core before finally egressing e0/2 onto South-Client-Handoff.

We could aleviate this by setting OSPF max-paths to 1, but then we'll still have to trust that OSPF on Edge will have its route through South-Core to 10.17.12.1 as the more stable (oldest) LSA (since they're equal cost paths) to avoid looping traffic through North-Core. Lucky for us, SVIs are pretty stable (unless you go and purge VLANs from the config, or start messing with VLAN suspending), but we can't guarantee that SVI 15 on South-Core (192.168.15.4) will always be stable (what if we only have that VLAN tagged on one circuit and it flaps) or that North-Core will have its current SVI 15 IP address. Let's see what happens when we set max-path to 1 and change North-Core's SVI 15 to 192.168.15.5.
Edge(config)#router ospf 1
Edge(config-router)#maximum-path 1
Edge#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 192.168.15.5 on Vlan15, 00:00:10 ago
  Routing Descriptor Blocks:
  * 192.168.15.5, from 4.4.4.4, 00:00:10 ago, via Vlan15
      Route metric is 3, traffic share count is 1
Because max-path clears the RIB, we get equal cost/equal age LSAs, and Edge selects North-Core's new, higher SVI (192.168.15.5) Let's see what happens if we shut e0/3 and e0/1 on North-Core to force Edge to only have the LSA for 10.17.12.1 from South-Core with a lower SVI (192.168.15.4), but then bring North-Core back in with a higher SVI but a shorter LSA age (less stability).
North-Core(config)#int range e0/1 , e0/3
North-Core(config-if-range)#shut
*Mar  7 00:54:10.390: %LINK-5-CHANGED: Interface Ethernet0/1, changed state to administratively down
*Mar  7 00:54:10.390: %LINK-5-CHANGED: Interface Ethernet0/3, changed state to administratively down
*Mar  7 00:54:11.395: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/1, changed state to down
*Mar  7 00:54:11.395: %LINEPROTO-5-UPDOWN: Line protocol on Interface Ethernet0/3, changed state to down
North-Core(config-if-range)#
*Mar  7 00:54:40.295: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on Vlan15 from FULL to DOWN, Neighbor Down: Dead timer expired
*Mar  7 00:54:40.612: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on Vlan24 from FULL to DOWN, Neighbor Down: Dead timer expired
North-Core(config-if-range)#
*Mar  7 00:54:46.425: %OSPF-5-ADJCHG: Process 1, Nbr 4.4.4.4 on Vlan24 from FULL to DOWN, Neighbor Down: Dead timer expired
North-Core(config-if-range)#
*Mar  7 00:54:47.820: %OSPF-5-ADJCHG: Process 1, Nbr 1.1.1.1 on Vlan15 from FULL to DOWN, Neighbor Down: Dead timer expired 
Edge#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 192.168.15.4 on Vlan15, 00:11:11 ago
  Routing Descriptor Blocks:
  * 192.168.15.4, from 4.4.4.4, 00:01:11 ago, via Vlan15
      Route metric is 3, traffic share count is 1 
North-Core(config-if-range)#no shut 
Edge#sh ip route 10.17.12.1
Routing entry for 10.17.12.1/32
  Known via "ospf 1", distance 110, metric 3, type intra area
  Last update from 192.168.15.5 on Vlan15, 00:00:02 ago
  Routing Descriptor Blocks:
  * 192.168.15.4, from 4.4.4.4, 00:04:31 ago, via Vlan15
      Route metric is 3, traffic share count is 1
Look at that. LSA stability (oldest in database) is preferred over North-Core's SVI 15 (192.168.15.5). But, that still doesn't afford us much comfort in our environment. Who wants to rely solely on LSA stability? Let's say we want to keep max-path as > 1, but we still don't want to split traffic.

Since we can't build a route-map with a prefix-list of Non-P2P MPLS Client Networks to increment the LSA cost received on North-Core from neighbor South-Client-Handoff, and we're not using point-to-multipoint so per-neighbor bandwidth cost is out, our best alternative to trusting that the LSA through South-Core is more stable is increasing the OSPF Cost on North-Core's SVI 24 (192.168.24.2) to 2 from 1 so Edge prefers the more direct path through South-Core.
Edge(config)#router ospf 1
Edge(config-router)#no max-path 1
Edge#sh ip route ospf
      10.0.0.0/32 is subnetted, 1 subnets
O        10.17.12.1 [110/3] via 192.168.15.4, 00:00:15, Vlan15
                    [110/3] via 192.168.15.3, 00:00:15, Vlan15
      192.168.24.0/28 is subnetted, 1 subnets
O        192.168.24.0 [110/2] via 192.168.15.4, 00:00:15, Vlan15
                      [110/2] via 192.168.15.3, 00:00:15, Vlan15 
North-Core(config)#int vlan 24
North-Core(config-if)#ip ospf cost 2 
Edge#sh ip route ospf
      10.0.0.0/32 is subnetted, 1 subnets
O        10.17.12.1 [110/3] via 192.168.15.4, 00:14:46, Vlan15
      192.168.24.0/28 is subnetted, 1 subnets
O        192.168.24.0 [110/2] via 192.168.15.4, 00:14:46, Vlan15

So, our best solution in order to keep multiple equal-cost paths, but not rely on LSA stability to avoid additional bandwidth utilization, appears to be altering the OSPF Cost on North-Core's VLAN 24 SVI.  Keep in mind, though, that this will apply to all advertised networks that North-Core gets via VLAN 24 from South-Client-Handoff.

Luckily, we have relatively large pipes in our production environment, so we can handle the extra switched bandwidth without worrying about altering per-SVI OSPF Cost. A minor bump in the road for a simplified configuration, I'd say.

Friday, March 6, 2015

Back into the Fire

Hey, networkers.  Sorry about the lengthy absence.  Shortly after finishing the last post covering Zone-Based Policy Firewalls, I caved in and bought the new CCIE Routing & Switching v5 books on Amazon.

Funny ol' thing, life.  I probably only managed a month's hiatus after finishing my CCNP before my head was already back in the books, but at least Ivan's Deploying Zone-Based Policy Firewalls was only 100-ish pages.

That said; I wouldn't have guessed that I would only make it two months before delving back into 700+ page Cisco Press books, let alone undertaking the dreaded CCIE.

I'll be keeping up on The Guide as best as possible, though. In fact, I've got a cool OSPF scenario I'll be putting up tomorrow.  While troubleshooting a client being DDoS'd this week, it was discovered that downstream traffic was taking an unexpected path.

Saturday, February 21, 2015

Much Ado about Zone-Based Firewalls

As things tend to go in the world o' networking, Friday proved immensely hectic.

I would've liked to have gotten this lab posted last night, but I made some last-minute changes to restrict SSH access to our "webserver" in the DMZ for a management subnet on the Inside security zone -- which I'm pretty pleased worked on the first attempt (save for a caveat where, when SSHing from a router (which I used as a makeshift "AdminTestHost") you have to enable SSHv2 to use it as a client).

All of that jargon aside, let's get going on what is, honestly, one of my new favorite lab implementations!

========================================================================


Above is the topology that we're using for the lab. Pretty straight-forward, with a bit of OSPF thrown in there for the sake of not having to build static routes.  For testing Outside to DMZ traffic, I've hooked an Ubuntu VM in VirtualBox onto GNS3, on which we can generate some Nmap traffic to prove out ZBF config.  I won't go into details on how to set that up, a) because this post is about properly building ZBF, and b) because there's plenty of guides online already for VMs/GNS3/Nmap.

I'll assume that, if you're at the point of curiosity about ZBFs, you probably already have the base knowledge required to cable/VLAN/IP address/route the topology. So, let's jump right into the ZBF config on our edge Cisco 7200 running IOS 15.0.  I'll enclose comments between the sections explaining briefly what each part does.

class-map type inspect match-any DMZ-to-Outside
 match protocol icmp
 match protocol http
!
//Class-map is used by the Policy-Map to inspect traffic moving between defined Security Zones. An "inspect" Class-Map must be used with an "inspect" Policy-Map// 
!
policy-map type inspect VLAN-10-DMZ-to-Outside
 class type inspect DMZ-to-Outside
  inspect
 class class-default
  drop log
!
//The Policy-Map is used by our Zone-Pair Service-Policy to inspect traffic matched by the Class-Map. If it matches (and we haven't specified any additional Parameter-Map to provide more granular control of traffic volume), the traffic passes. If there's no match, the traffic is dropped by default (I added the "log" for fun later)//
!
zone security VLAN10-WebServer1-DMZ
 description Security zone for WebServer1
zone security Outside
 description Outside to Internet
zone-pair security Outside-to-VLAN-10-DMZ source Outside destination VLAN10-WebServer1-DMZ
 service-policy type inspect VLAN-10-DMZ-to-Outside
!
//Before you can define a Zone-Pair, or make interfaces Zone members, you need to define the Security Zones. Once interfaces have been made a zone-member (an interface can only belong to one zone), you can define the Zone-Pair. The pair is what controls traffic flow and applies our Policy-Map to it. You only need to define a zone-pair in one direction as traffic initiated in that direction is covered by the same Zone-Pair when we see return traffic.  For security purposes, I've only created a Zone-Pair for traffic sourced from our DMZ destined for the Outside. If we needed our WebServer to hit the Internet for updates, etc., we would need to build a Zone-Pair where DMZ can source first// 
!
interface FastEthernet0/0
 description Outside
 ip address 172.31.122.2 255.255.255.248
 ip nat outside
 ip virtual-reassembly
 zone-member security Outside
 duplex full
 !
//Above and below, we add the Outside and DMZ interfaces to their respective Security Zones// 
!
interface FastEthernet1/0.10
 description VLAN10-WebServer-1-DMZ
 encapsulation dot1Q 10
 ip address 192.168.10.1 255.255.255.0
 ip nat inside
 ip virtual-reassembly
 zone-member security VLAN10-WebServer1-DMZ

Now, let's confirm everything was built correctly:
Edge-7200#sh policy-map type inspect zone-pair session
policy exists on zp Outside-to-VLAN-10-DMZ
 Zone-pair: Outside-to-VLAN-10-DMZ
  Service-policy inspect : VLAN-10-DMZ-to-Outside
    Class-map: DMZ-to-Outside (match-any)
      Match: protocol icmp
        0 packets, 0 bytes
        30 second rate 0 bps
      Match: protocol udp
        0 packets, 0 bytes
        30 second rate 0 bps
      Match: protocol tcp
        0 packets, 0 bytes
        30 second rate 0 bps
   Inspect
    Class-map: class-default (match-any)
      Match: any
      Drop
        0 packets, 0 bytes
We'll enable HTTP access ((config)#ip http server) on our "WebServer1" then try to Nmap our WebServer (172.31.122.3) from our Outside VM. Also, to add a bit of cool output when we run our next Nmap, I changed the policy-map class-default from "drop" to "drop log" on the edge 7200.

capn@capn-VirtualBox:~$ nmap -n 172.31.122.3
Nmap scan report for 173.31.122.3
Host is up (0.33c latency)
Not shown: 999 filtered ports
PORT STATE SERVICE
80/tcp open http

On our edge Cisco 7200, we can see the results of the Nmap against our ZBF policy where it rejects attempts over HTTPS, SMTP, and POP3:

*Feb 20 18:52:29.683: %FW-6-DROP_PKT: Dropping tcp session 192.168.56.102:48964 192.168.10.100:443 on zone-pair
Outside-to-VLAN-10-DMZ class class-default due to  DROP action found in policy-map with ip ident 0
*Feb 20 18:52:33.067: %FW-6-LOG_SUMMARY: 1 packet were dropped from 192.168.56.102:48964 => 192.168.10.100:443
(target:class)-(Outside-to-VLAN-10-DMZ:class-default)
*Feb 20 18:52:33.067: %FW-6-LOG_SUMMARY: 1 packet were dropped from 192.168.56.102:48965 => 192.168.10.100:443
(target:class)-(Outside-to-VLAN-10-DMZ:class-default)
*Feb 20 18:52:33.071: %FW-6-LOG_SUMMARY: 1 packet were dropped from 192.168.56.102:55985 => 192.168.10.100:587
(target:class)-(Outside-to-VLAN-10-DMZ:class-default)
*Feb 20 18:52:33.071: %FW-6-LOG_SUMMARY: 2 packets were dropped from 192.168.56.102:46872 => 192.168.10.100:25
(target:class)-(Outside-to-VLAN-10-DMZ:class-default)
*Feb 20 18:52:33.071: %FW-6-LOG_SUMMARY: 2 packets were dropped from 192.168.56.102:36496 => 192.168.10.100:110
!
truncated for brevity
!
Now that that's all setup, we'll add an Inside zone and some rules to allow us to SSH into our WebServer from the Inside for management, but drop any attempts to do so from the Outside. First, we'll build SSH access on our router masquerading as a WebServer.

WebServer1(config)#ip domain-name LabCorp.net
WebServer1(config)#username admin password root
WebServer1(config)#aaa new-model
WebServer1(config)#crypto key gen rsa
The name for the keys will be: WebServer1.LabCorp.net
Choose the size of the key modulus in the range of 360 to 2048 for your
  General Purpose Keys. Choosing a key modulus greater than 512 may take
  a few minutes.
How many bits in the modulus [512]: 1024
% Generating 1024 bit RSA keys, keys will be non-exportable...[OK]
WebServer1(config)#
*Mar  1 00:28:24.143: %SSH-5-ENABLED: SSH 2.0 has been enabled
WebServer1(config)#ip ssh ver 2
WebServer1(config)#line vty 0 935
WebServer1(config-line)#password root
WebServer1(config-line)#transport input ssh
WebServer1(config)#service password-encrypt
WebServer1(config)#no service password-encrypt

Just for fun, I added "match protocol ssh" to the class-map on the Edge-7200 and attempted to SSH in from the my VM on the Outside to WebServer1:

capn@capn-VirtualBox:~$ ssh -l admin 172.31.122.3
admin@172.31.122.3's password:
WebServer1>exit
Connection to 172.31.122.3 closed

 Ok, we'll tear that match off the class-map now aaaand...just as we hoped, we can't SSH into WebServer1 from our Outside VM anymore!

capn@capn-VirtualBox:~$ ssh -l admin 172.31.122.3
^C
capn@capn-VirtualBox:~$

And look what appeared on our Edge-7200!

*Feb 19 16:20:31.435: %FW-6-DROP_PKT: Dropping tcp session 192.168.56.102:41216 192.168.10.100:22 on zone-pair
Outside-to-VLAN-10-DMZ class class-default due to  DROP action found in policy-map with ip ident 0
Edge-7200#
*Feb 19 16:20:38.075: %FW-6-LOG_SUMMARY: 3 packets were dropped from 192.168.56.102:41216 => 192.168.10.100:22 (target:class)-(Outside-to-VLAN-10-DMZ:class-default)

For implementing our Inside with some simplicity, we'll hang our Inside LAN zone off of fa1/1 (10.1.1.1) on our Edge-7200 with an MLS and put fa1/1 of our 7200 in OSPF area 0, fa1/0.10 (192.168.10.1) of our 7200 (the gateway for our VLAN 10 WebServer1 DMZ) as a passive-int in area 0 as well, and then our Inside LAN as part of area 0 as well. It's not included below because, when I originally wrote this, I was planning on using a host connection, but the 192.168.1.0/24 off of LAN-Switch fa0/1 for AdminTestHost was also put into OSPF Area 0 as a passive-int. It's just not included in the RIB output below because that was done later. For the sake of information, LAN-Switch is advertising 192.168.1.0/24 into OSPF so WebServer traffic hitting the 7200 can find its way back to the AdminTestHost.


Let's check our RIB on our LAN-Switch:

LAN-Switch#sh ip route
Gateway of last resort is 10.1.1.1 to network 0.0.0.0
O*E2  0.0.0.0/0 [110/1] via 10.1.1.1, 00:11:58, Ethernet1/1 <--used "default-info originate always" on 7200
      10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        10.1.1.0/30 is directly connected, Ethernet1/1
L        10.1.1.2/32 is directly connected, Ethernet1/1
O     192.168.10.0/24 [110/11] via 10.1.1.1, 00:10:24, Ethernet1/1

We can ping DMZ gateway fa1/0.10 (192.168.10.1) and Outside fa0/0 (172.31.122.2) of our 7200 from the LAN -- this is because traffic destined to a zone's interface itself won't be inspected/dropped as part of the ZBF policy. I have a suspicion that we won't be able to hit WebServer1 without a zone-pair built, and we definitely won't be able to hit R1's fa0/0 (172.31.122.1) without a NAT pool and a zone-pair.

LAN-Switch#ping 192.168.10.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 39/40/41 ms
LAN-Switch#ping 172.31.122.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.31.122.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/40/42 ms
LAN-Switch#ping 192.168.10.100
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.100, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
LAN-Switch#ping 172.31.122.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.31.122.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Let's start by building the NAT pool. We'll just permit 10.1.1.0/30 to start, because our test pings from the LAN-Switch are going to be sourced from 10.1.1.2 anyways. We'll add to the ACL so the rest of Inside can get Outside later.

Edge-7200(config)#access-list 1 permit 10.1.1.0 0.0.0.3
Edge-7200(config)#ip nat inside source list 1 pool Inside-LAN-NAT-Pool overload
Edge-7200(config)#ip nat pool Inside-LAN-NAT-Pool 172.31.122.4 172.31.122.4 netmask 255.255.255.248
Edge-7200(config)#int fa1/1
Edge-7200(config-if)#ip nat inside

Ok, the NAT (or rather, PAT) pool is built. Just to prove that it's not enough to setup NAT and that we need the zone-pair as well, let's try to ping 172.31.122.1 once more:

Success rate is 0 percent (0/5)
LAN-Switch#ping 172.31.122.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.31.122.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Yup, no luck. Let's get to building that new Inside to Outside zone-pair. Here's the additional config added. We'll start with just matching icmp for the sake of proving pings. For the sake of demonstrating class-map match-all inspects with an access-group, later we'll setup all of Inside LAN with the ability to open HTTP sessions to WebServer1, but only an admin subnet (192.168.1.0/24) with the ability to SSH from Inside to WebServer1 in the DMZ.


class-map type inspect match-any InsideToInternet
 match protocol icmp
!
!
policy-map type inspect Inside-to-Outside
 class type inspect InsideToInternet
  inspect
 class class-default
  drop log
!
zone security Outside
 description Outside to Internet
zone security Inside
 description Inside to LabCorp LAN
!
zone-pair security Inside-LAN-to-Outside source Inside destination Outside
 service-policy type inspect Inside-to-Outside
!
!
interface FastEthernet1/1
 ip address 10.1.1.1 255.255.255.252
 ip nat inside
 ip virtual-reassembly
 zone-member security Inside


Now, let's see if we can ping from our Inside LAN Switch to the Outside R1's fa0/0 and loopback1:

LAN-Switch#ping 172.31.122.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.31.122.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 21/37/68 ms
LAN-Switch#ping 216.36.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 216.36.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 19/28/52 ms

Inside to Outside access is good, so now let's get some access for users in our Inside LAN to our DMZ WebServer1, and we'll build that exclusive admin subnet SSH management access I mentioned earlier!

Edge-7200(config)#class-map type inspect match-any Inside-to-WebServer1-DMZ
Edge-7200(config-cmap)#match protocol icmp
Edge-7200(config-cmap)#match protocol http
Edge-7200(config-cmap)#match protocol https
Edge-7200(config)#class-map type inspect match-any Inside-Admins-Traffic
Edge-7200(config-cmap)#match protocol ssh
Edge-7200(config-cmap)#match protocol icmp
Edge-7200(config-cmap)#match protocol https
Edge-7200(config)#class-map type inspect match-all Inside-Admins-to-WebServer1-DMZ
Edge-7200(config-cmap)#match class-map Inside-Admins-Traffic
Edge-7200(config-cmap)#match access-group name AdminSubnet
Edge-7200(config)#ip access-list extended AdminSubnet
Edge-7200(config-ext-acl)#permit ip 192.168.1.0 0.0.0.255 host 192.168.10.100
Edge-7200(config)#policy-map type inspect Inside-to-WebServer1-DMZ
Edge-7200(config-pmap)#class type inspect Inside-to-WebServer1-DMZ
Edge-7200(config-pmap-c)#inspect
Edge-7200(config-pmap)#class type inspect Inside-Admins-to-WebServer1-DMZ
Edge-7200(config-pmap-c)#inspect
Edge-7200(config-pmap)#class class-default
Edge-7200(config-pmap-c)#drop log
Edge-7200(config)#zone-pair security Inside-to-WebServer1-DMZ source Inside destination VLAN10-WebServer1-DMZ
Edge-7200(config-sec-zone-pair)#service-policy type inspect Inside-to-WebServer1-DMZ

NOTE: Not sure if you have any background with class-maps, but the purpose of using "match-all" on the third map above, rather than the "match-any" that we've been using thus-far, is that "match-any" is an OR logic, whereas "match-all" is an AND logic. In order for the traffic to qualify for this class-map, not only does it need to be of one of the protocols we specified in "match class-map Inside-Admins-Traffic" for SSH/ICMP/HTTPS, the traffic also needs to match our Extended ACL which only permits traffic sourced from the management subnet (192.168.1.0/24) destined for the webserver (192.168.10.100).

First, let's start by pinging from our Inside router R3 to verify basic, non-admin, Inside traffic to WebServer1

R3#ping 192.168.10.100
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.100, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 28/39/68 ms

Looks good. Now, let's see if we can SSH to it from non-admin Inside

R3#ssh -l admin 192.168.10.100
R3# 

And we can confirm it failed on our Edge-7200

Edge-7200#
*Feb 20 20:15:30.355: %FW-6-DROP_PKT: Dropping ssh session 10.1.1.2:13309 192.168.10.100:22 on zone-pair Inside-to-WebServer1-DMZ class class-default due to  DROP action found in policy-map with ip ident 0
Edge-7200#
*Feb 20 20:15:33.067: %FW-6-LOG_SUMMARY: 2 packets were dropped from 10.1.1.2:13309 => 192.168.10.100:22 (target:class)-(Inside-to-WebServer1-DMZ:class-default)
Let's try it from our AdminTestHost (R4) whose interface is 192.168.1.42

AdminTestHost#ssh -l admin 192.168.10.100
[Connection to 192.168.10.100 aborted: error status 0]
AdminTestHost#ping 192.168.10.100
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.100, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 32/56/92 ms

Well, we can ping it...and the ZBF config is good...maybe we need to enable SSHv2 on our "AdminTestHost" which is actually just a router masquerading as a host. Spoiler: Yes, that was the case. Jumped through the hoops to get SSHv2 enabled on our AdminTestHost aaaaand...

AdminTestHost#ssh -l admin 192.168.10.100
Password:
WebServer1>

Success! We can officially:


  • Send non-admin traffic from Inside to WebServer1 in the DMZ, but not SSH to it
  • Send admin traffic from Inside to WebServer1 in the DMZ and SSH to it
  • Send Inside traffic to the Outside Internet
  • Nmap from the Outside to our WebServer1 in the DMZ to verify that only HTTP (port 80) is open


Below is our final ZBF config we built on our Edge-7200 to allow all this to be possible, and what a hell of a lab it's been. Go get yourself a scotch and celebrate success! Thanks for sticking with it to the end.

class-map type inspect match-any Inside-Admins-Traffic
 match protocol ssh
 match protocol icmp
 match protocol https
class-map type inspect match-all Inside-Admins-to-WebServer1-DMZ
 match class-map Inside-Admins-Traffic
 match access-group name AdminSubnet
class-map type inspect match-any DMZ-to-Outside
 match protocol icmp
 match protocol http
class-map type inspect match-any Inside-to-WebServer1-DMZ
 match protocol icmp
 match protocol http
 match protocol https
class-map type inspect match-any InsideToInternet
 match protocol icmp
!
!
policy-map type inspect VLAN-10-DMZ-to-Outside
 class type inspect DMZ-to-Outside
  inspect
 class class-default
  drop log
policy-map type inspect Inside-to-WebServer1-DMZ
 class type inspect Inside-to-WebServer1-DMZ
  inspect
 class type inspect Inside-Admins-to-WebServer1-DMZ
  inspect
 class class-default
  drop log
policy-map type inspect Inside-to-Outside
 class type inspect InsideToInternet
  inspect
 class class-default
  drop log
!
zone security VLAN10-WebServer1-DMZ
 description Security zone for WebServer1
zone security Outside
 description Outside to Internet
zone security Inside
 description Inside to LabCorp LAN
zone-pair security Outside-to-VLAN-10-DMZ source Outside destination VLAN10-WebServer1-DMZ
 service-policy type inspect VLAN-10-DMZ-to-Outside
zone-pair security Inside-LAN-to-Outside source Inside destination Outside
 service-policy type inspect Inside-to-Outside
zone-pair security Inside-to-WebServer1-DMZ source Inside destination VLAN10-WebServer1-DMZ
 service-policy type inspect Inside-to-WebServer1-DMZ
!
!
interface FastEthernet0/0
 description Outside
 ip address 172.31.122.2 255.255.255.248
 ip nat outside
 ip virtual-reassembly
 zone-member security Outside
 duplex full
 !
!
interface FastEthernet1/0.10
 description VLAN10-WebServer-1-DMZ
 encapsulation dot1Q 10
 ip address 192.168.10.1 255.255.255.0
 ip nat inside
 ip virtual-reassembly
 zone-member security VLAN10-WebServer1-DMZ
 ip ospf 1 area 0
!
interface FastEthernet1/1
 ip address 10.1.1.1 255.255.255.252
 ip nat inside
 ip virtual-reassembly
 zone-member security Inside
 ip ospf authentication message-digest
 ip ospf message-digest-key 1 md5 LabCorpOSPF
 ip ospf 1 area 0
 duplex auto
 speed auto

Wednesday, February 18, 2015

Checking in, and fan-boying for Ivan

Hey, now.  Don't you worry.  I haven't forgotten about The Guide.

As it happens, I've been perusing Ivan Pepelnjak's digital shortcut, Deploying Zone-Based Firewalls.  "Why?" you might currently be screaming at the top of your lungs.  Well, partially because the man is quite brilliant, but mostly because ZBFs are suspiciously simple to implement...and because I've found that emulating ASAs in Qemu on GNS3 has been nothing short of abysmal.

Qemu bashing aside, I've just finished making final tweaks to my latest lab.  It's a fairly straight-forward Inside/DMZ/Outside zone design, but it's a good base configuration that's easy to extrapolate and build upon -- oh, and it takes into account that, unlike in Ivan's book (which was written using IOS 12.4, I believe), I'm using 15.0 on my edge router, which doesn't allow the use of inspect with class-default on the zone-pair's policy-map.

So, fun stuff coming down the pipe! If the ZBF lab isn't up by tomorrow, look for it on Friday.

Thursday, February 12, 2015

STP Diameter and the Art of -- Wait...Are those daisy-chained?!

Well, things have finally calmed down a bit, leaving me enough time to sift through the notes I had left myself for drafting this post.  I'll say it now, this one is a bit more theory-related.  I've encountered a topology or two that had haphazardly chained switches, but, for the purposes of demonstrating STP diameter, we'll be using an indescribably atrocious environment that one would be hard-pressed to encounter in the field.

There is a general rule that floats around networking forums -- particularly Cisco's -- which states that a network with an STP diameter of 7+ can be a risk for instability and unexpected re-convergence.  If you haven't heard of the Beth Israel Deaconess Medical Center meltdown then, please, allow me to summarize:

In 2002 -- I note the year just because 802.1w came out in 2001, but maybe BIDMC had cold feet about the transition -- the infrastructure at BIDMC came to a full stop due to layer 2 instability.  After what was described as extensive work with Cisco TAC and engineers who flew in, it was discovered that BIDMC had exceeded an STP diameter of 7.

Going forward, keep in mind that, maybe by the standards of over a decade ago, 7 was unfeasible and possibly even downright impossible, but, today (and as we'll see below) we can get away with a bit more carelessness.

Recall that STP has three major timers;

  • Hello = The amount of time between each Hello BPDU sent between switches. 2 seconds, by default, but anywhere from 1-10 seconds.
  • FWD Delay = The amount of time spent in the Listening and Learning states, respectively, before transitioning; 15 seconds, by default, but anywhere from 4-30 seconds.
  • Max Age = The max length of time that can pass before a switch saves its Config BPDU info. 20 seconds, by default, but anywhere from 6-40 seconds.

Max Age can only be reset by the receiving of a new superior BPDU which changes the local bridge's view on how to best reach root.

Each Config BPDU contains these 3 values, but, in addition, contains a little known bonus timer called "Message Age." MSG Age isn't a fixed value. The Root sends all BPDUs with MSG Age=0 and subsequent non-roots that receive the BPDU increment MSG Age by 1 then relay it. Effectively, MSG Age represents how far you are from the Root upon receiving the BPDU.

When a BPDU arrives that is superior (better BID (MAC address + Bridge Priority)) to the current BPDU received from the current Root, the new, superior BPDU is stored and the Age Timer starts to increment, beginning at a value equal to the MSG Age received in the superior BPDU. If the Age Timer reaches a value equal to the switch's Max Age Timer before another BPDU is received from the Root (remember, the ROOT is always sending Hello BPDUs at a default of 2 sec) then the Age Timer doesn't refresh and the superior BPDU is aged out.

You can see what a problem this might cause with larger STP diameters...

Because we're working in a virtual infrastructure, say we've really let our network burn to the ground, and 18 switches are daisy chained from ROOT to furthest Access Switch. So our diameter = 18.

A majestic, swirling vortex of fail.

Recall that the Root originates its BPDU with MSG AGE = 0, then each switch increases MSG Age by 1 as it relays, so, at the far-end switch...MSG Age = 17 upon receiving the BPDU from the Root.

So, our Age Timer starts at 17 seconds, and we have 3 seconds of hold time (Max Age - MSG Age) before Max Age expires and the superior BPDU is discarded. By default, our Root will only re-send BPDUs every 2 seconds, so, assuming decent line speed/no link saturation, let's say it takes 1 second for the far-end switch to receive the superior BPDU and refresh its Age Timer.

Let's see how the MSG Age looks as we move down the chain, starting on the Root.

Root Switch in the daisy chain originating Superior BPDU:

Root-SW1#sh spanning-tree vlan 1 detail
 VLAN0001 is executing the ieee compatible Spanning Tree protocol
  Bridge Identifier has priority 24576, sysid 1, address aabb.cc00.0100
  Configured hello time 2, max age 20, forward delay 15
  We are the root of the spanning tree
  Topology change flag not set, detected flag not set
  Number of topology changes 2 last change occurred 00:02:00 ago
          from Ethernet0/0
  Times:  hold 1, topology change 35, notification 2
          hello 2, max age 20, forward delay 15
  Timers: hello 0, topology change 0, notification 0, aging 300
 Port 1 (Ethernet0/0) of VLAN0001 is designated forwarding
!truncated for brevity!
   Timers: message age 0

Now, the second switch in the daisy chain receiving the Superior BPDU:

SW2#sh spanning-tree vlan 1 detail | incl message age
Timers: message age 1, forward delay 0, hold 0

We'll see the MSG Age timer increment as the switch waits for the next superior BPDU to arrive:

SW2#sh spanning-tree vlan 1 detail | incl message age
Timers: message age 2, forward delay 0, hold 0

It will drop back to "message age 1" as, by the time the Hello arrived, set the message age to 1, and the next Hello left the Root and arrived, MSG Age will have incremented to 2 (possibly on its way to 3).


Here's the third switch in the daisy chain receiving the superior BPDU:

SW3#sh spanning-tree vlan 1 detail | incl message age
Timers: message age 2, forward delay 0, hold 0

It starts at 2 rather than 1 because the BPDU's Message Age was incremented from 0 to 1 upon being received on SW2 then relayed from SW2 with Message Age = 1, which SW3 then incremented to 2 upon receiving it. It increments towards Max Age as it waits for a superior BPDU to arrive to refresh MSG Age.

SW3#sh spanning-tree vlan 1 detail | incl message age
Timers: message age 3, forward delay 0, hold 0

For the sake of brevity, we'll skip down the chain to SW6 and check its MSG Age timers:

SW6#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 5, forward delay 0, hold 0
SW6#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 6, forward delay 0, hold 0

Now, all of this works nicely because, as you can see it takes maybe 1 second at most for the Root to get its superior BPDU down to the furthest switch and refresh the MSG Age timer. We probably won't see any issues unless we get our diameter up near 19 or 20. Let's see what happens!

As we get up on our 13th switch in the chain, we can see that the BPDU starts to take longer to arrive/be processed:

SW13#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 12, forward delay 0, hold 0
SW13#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 13, forward delay 0, hold 0
SW13#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 14, forward delay 10, hold 0

The superior BPDU that arrived on SW should have a MSG Age of 12, so it's taking a full 2 seconds before the Root's new BPDU can arrive and be processed by SW13 to refresh the MSG Age timer.

Let's skip ahead to our 18th switch

SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 17, forward delay 0, hold 0
SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 18, forward delay 0, hold 0
SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 19, forward delay 0, hold 0
SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 20, forward delay 0, hold 0
SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 0, forward delay 0, hold 0

Yup, this is exactly the sort of thing that wreaks havoc on a network (though, in all fairness, if your network has an STP diameter bad enough where your MSG Age is starting at 17, you're probably already waist-deep in havoc). Notice how the age timer starts at 17 (as it should, since Root originates it at 0 and we're 18 switches deep), but it takes a full 3 seconds for the superior BPDU to reach switch 18.

Since our Max Age was reached before the Hello could be received/processed, the current superior BPDU was discarded, MSG Age resets to 0, and our far-end switch is now undergoing a topology change where it first assumes itself to be the Root:

SW18#sh spanning-tree vlan 1
VLAN0001
  Spanning tree enabled protocol ieee
  Root ID    Priority    32769
             Address     aabb.cc00.1200
             This bridge is the root

Keep in mind, this simulation assumes a topology with no production traffic crossing it, no actual cables (so no chance of EMI/bends/general CRC causes), and really no overhead on the switches at all. It's entirely possible that we could have seen this on switch 10 or 11. One should always take into consideration the propensity for lost Hello BPDUs and, in a worst case scenario, how long the end-to-end BPDU propagation truly is, given the number of lost Hellos, the frequency with which Hellos go out (Hello Timer), BPDU Delay (amount of time it takes a switch to receive a BPDU then relay it (1 second max), and your STP diameter.


Cisco has a nice little algorithm providing that propagation delay:
End-to-end_BPDU_propa_delay
= ((lost_msg + 1) x hello) + ((BPDU_Delay x (dia – 1))
= ((3 + 1) x hello) + ((1 x (dia – 1))
= 4 x hello + dia – 1
= 4 x 2 + 6
= 14 sec

Once you have propagation delay, you can also make sense of why Max Age is 20 seconds by default:
max_age
= End-to-end_BPDU_propa_delay + Message_age_overestimate
= 14 + 6
= 20 sec

Where "Message_age_overestimate" accounts for the age of the BPDU since origination by the Root and 1 second incrementing by each relaying non-root:
Message_age_overestimate
= (dia – 1) x overestimate_per_bridge
= dia – 1
= 6

Once we take cable faults, switching delays, and general network overhead into account, 7 starts to look like a more fault-tolerant STP diameter limit.


So, there you have it! STP diameter and why anyone in their right mind has long since migrated to 802.1w.  When I landed my first networking gig, you can imagine my surprise when I saw a few daisy-chained switches. While certainly not conducive to scalability and redundancy, it wasn't the end of the world, but it did merit a change control submission to move to something less...let's call it "horrifying."

Friday, January 30, 2015

Good Guy Brocade and Gigabit Negotiation

As I sat at my desk last night, savoring my bowl of soup (or, rather, lamenting the initial mouth-scorching). I was jotting down ideas of how to best proceed with future posts.  It didn't take long to arrive at the conclusion that, since, in a way, I'm cataloging my growth as a networker, it would behoove me to start with some of the first issues I encountered.

What took even less time than arriving at that decision was getting derailed from the plan altogether. So, like a bad M. Night Shyamalan film, we'll commence from the end.

Around the time when sleep starts to seem like an amazingly fantastic idea, the on-call phone rang. One of our analysts relayed to me that a pair of non-production 10Gbps circuits were currently down on one side, but still showing up on the far-end.

I'll save you the trouble of sifting through the same list of possible causes that I did initially, because this post is more-so about a (seemingly) minor configuration that can trip you up when deploying fiber, though more commonly when dealing with a fiber hand-off.

Of course, what I'm struggling to get to the point about is...Gigabit Negotiation.



Now, I don't intend on duplicating efforts here.  If you're looking for how negotiation mismatches can impact fiber between Cisco gear, this Packet Herder has already done a fine job.  The reason my interest was piqued was because we happen to run Brocade in our core, and I hadn't the foggiest how Brocade implements Gigabit Negotiation, or even what their CLI syntax is.

So began the digging.

Well, I don't have a clue where I'd find the output outright indicating the configured state of negotiation, though a Cisco-educated guess would point towards a variant of "show interface transceiver properties." Hmm...nope. Let's check the running-config!

Nothing under the interface section. Must be using the default configuration. OK, let's see how Brocade rolls by default. Huh. "Negotiate-full-auto":

Negotiate-full-auto - The port first tries to perform a handshake with the other port to exchange capability information. If the other port does not respond to the handshake attempt, the port uses the manually configured configuration information (or the defaults if an administrator has not set the information). This is the default.

Well, that solves the whole "near-end isn't negotiating, but far-end is trying, so hang the sense of bringing the link up" problem. But, wait...our MLX configuration guide wants to chime in, and it even covers an additional mode that the previous FastIron switch guide didn't (probably worth mentioning that it's because a MLX is NetIron, rather than FastIron. See?! Google recklessly and you'll end up with shoddy information):

The neg-full-auto and auto-full options are not supported on the Brocade NetIron XMR, Brocade MLXseries, NetIron MG8, and NetIron 40G.

So, by the looks of it, Brocade offers four options (depending on your model):

(config-if)#[no] gig-default neg-full-auto | auto-gig | neg-off | auto -full

  • neg-full-auto (which we already covered above) -The port is only for copper-SFP and to support 10/100/1000M tri-speed auto negotiation.
  • auto-full -- The port tries to perform a negotiation with its peer port to exchange capability information. If it is unable to reach an agreed upon speed, the port goes into a fixed speed and keeps the link up.
  • auto-gig – The port tries to performs a negotiation with its peer port to exchange capability information. This is the default state.
  • neg-off – The port does not try to perform a negotiation with its peer port.

Your decision may differ based on what model you have deployed -- and congrats if you get to mess around with neg-full-auto. It sounds like an awesome alternative, -- but, for the rest of us, it looks like we're stuck with auto-gig and neg-off, the former being our true default.

Assuming that both sides are correctly defaulted to auto-gig we can go ahead and rule out Gigabit Negotiation as the culprit.  If you were feeling brave and went with neg-off, you might want to check for a unidirectional issue. A lack of Layer 1 negotiation seems to imply that, as long as light is being received, an interface will stay UP/UP, but there could be a problem with that same device's TX fiber which causes a loss of light (and subsequent DOWN/DOWN state) on the far-end.


Well, hopefully, next time we'll be taking a bit of a step backwards and touching on some of the more humble beginnings of networking endeavors.  For now, sit back, mull this over, and enjoy some scotch.

Thursday, January 29, 2015

SO IT BEGINS...

So, there comes a time -- usually a week or two into your first gig -- when it's no longer even remotely feasible to use a notebook/sticky-notes in order to keep track of the wealth of knowledge/fixes/duct-tape workarounds that ensue on a day to day basis.  Sure, you might get by for a while with an ever-expanding binder of notes/diagrams, and sub-directory after sub-directory of notepad docs in a flash-drive that has been through the wash once or twice.  Eventually, though, your repository is going to start to bear a striking resemblance to that SMB "data-center" that still haunts your dreams:


GOOD GOD, MAN! Quick, cleanse it! Cleanse it with FIRE!!!


Sooner or later, you're shooting yourself in the foot with every passing second that you haven't turned up a wiki to dump tid-bits of knowledge and life lessons into. Because -- let's face it -- you can't remember everything.



Dumbledore documents like a boss.


Since Olivander was pretty persistent about not affording me a wand of my own -- Come-on, you've quite literally got walls of 'em just sitting around -- I'm commemorating the retirement of my own binder-o-notes, by starting a blog under the same name.

While I certainly won't be dumping every single notepad doc and diagram I handle day in and day out, this will (hopefully) shape up to be a good means for tracking interesting issues/quirky bugs/and my growth as a networking professional after somehow stumbling into the field three years ago.

It's like a flip-book, but with big words...and scotch. Don't forget the scotch.