Saturday, February 21, 2015

Much Ado about Zone-Based Firewalls

As things tend to go in the world o' networking, Friday proved immensely hectic.

I would've liked to have gotten this lab posted last night, but I made some last-minute changes to restrict SSH access to our "webserver" in the DMZ for a management subnet on the Inside security zone -- which I'm pretty pleased worked on the first attempt (save for a caveat where, when SSHing from a router (which I used as a makeshift "AdminTestHost") you have to enable SSHv2 to use it as a client).

All of that jargon aside, let's get going on what is, honestly, one of my new favorite lab implementations!

========================================================================


Above is the topology that we're using for the lab. Pretty straight-forward, with a bit of OSPF thrown in there for the sake of not having to build static routes.  For testing Outside to DMZ traffic, I've hooked an Ubuntu VM in VirtualBox onto GNS3, on which we can generate some Nmap traffic to prove out ZBF config.  I won't go into details on how to set that up, a) because this post is about properly building ZBF, and b) because there's plenty of guides online already for VMs/GNS3/Nmap.

I'll assume that, if you're at the point of curiosity about ZBFs, you probably already have the base knowledge required to cable/VLAN/IP address/route the topology. So, let's jump right into the ZBF config on our edge Cisco 7200 running IOS 15.0.  I'll enclose comments between the sections explaining briefly what each part does.

class-map type inspect match-any DMZ-to-Outside
 match protocol icmp
 match protocol http
!
//Class-map is used by the Policy-Map to inspect traffic moving between defined Security Zones. An "inspect" Class-Map must be used with an "inspect" Policy-Map// 
!
policy-map type inspect VLAN-10-DMZ-to-Outside
 class type inspect DMZ-to-Outside
  inspect
 class class-default
  drop log
!
//The Policy-Map is used by our Zone-Pair Service-Policy to inspect traffic matched by the Class-Map. If it matches (and we haven't specified any additional Parameter-Map to provide more granular control of traffic volume), the traffic passes. If there's no match, the traffic is dropped by default (I added the "log" for fun later)//
!
zone security VLAN10-WebServer1-DMZ
 description Security zone for WebServer1
zone security Outside
 description Outside to Internet
zone-pair security Outside-to-VLAN-10-DMZ source Outside destination VLAN10-WebServer1-DMZ
 service-policy type inspect VLAN-10-DMZ-to-Outside
!
//Before you can define a Zone-Pair, or make interfaces Zone members, you need to define the Security Zones. Once interfaces have been made a zone-member (an interface can only belong to one zone), you can define the Zone-Pair. The pair is what controls traffic flow and applies our Policy-Map to it. You only need to define a zone-pair in one direction as traffic initiated in that direction is covered by the same Zone-Pair when we see return traffic.  For security purposes, I've only created a Zone-Pair for traffic sourced from our DMZ destined for the Outside. If we needed our WebServer to hit the Internet for updates, etc., we would need to build a Zone-Pair where DMZ can source first// 
!
interface FastEthernet0/0
 description Outside
 ip address 172.31.122.2 255.255.255.248
 ip nat outside
 ip virtual-reassembly
 zone-member security Outside
 duplex full
 !
//Above and below, we add the Outside and DMZ interfaces to their respective Security Zones// 
!
interface FastEthernet1/0.10
 description VLAN10-WebServer-1-DMZ
 encapsulation dot1Q 10
 ip address 192.168.10.1 255.255.255.0
 ip nat inside
 ip virtual-reassembly
 zone-member security VLAN10-WebServer1-DMZ

Now, let's confirm everything was built correctly:
Edge-7200#sh policy-map type inspect zone-pair session
policy exists on zp Outside-to-VLAN-10-DMZ
 Zone-pair: Outside-to-VLAN-10-DMZ
  Service-policy inspect : VLAN-10-DMZ-to-Outside
    Class-map: DMZ-to-Outside (match-any)
      Match: protocol icmp
        0 packets, 0 bytes
        30 second rate 0 bps
      Match: protocol udp
        0 packets, 0 bytes
        30 second rate 0 bps
      Match: protocol tcp
        0 packets, 0 bytes
        30 second rate 0 bps
   Inspect
    Class-map: class-default (match-any)
      Match: any
      Drop
        0 packets, 0 bytes
We'll enable HTTP access ((config)#ip http server) on our "WebServer1" then try to Nmap our WebServer (172.31.122.3) from our Outside VM. Also, to add a bit of cool output when we run our next Nmap, I changed the policy-map class-default from "drop" to "drop log" on the edge 7200.

capn@capn-VirtualBox:~$ nmap -n 172.31.122.3
Nmap scan report for 173.31.122.3
Host is up (0.33c latency)
Not shown: 999 filtered ports
PORT STATE SERVICE
80/tcp open http

On our edge Cisco 7200, we can see the results of the Nmap against our ZBF policy where it rejects attempts over HTTPS, SMTP, and POP3:

*Feb 20 18:52:29.683: %FW-6-DROP_PKT: Dropping tcp session 192.168.56.102:48964 192.168.10.100:443 on zone-pair
Outside-to-VLAN-10-DMZ class class-default due to  DROP action found in policy-map with ip ident 0
*Feb 20 18:52:33.067: %FW-6-LOG_SUMMARY: 1 packet were dropped from 192.168.56.102:48964 => 192.168.10.100:443
(target:class)-(Outside-to-VLAN-10-DMZ:class-default)
*Feb 20 18:52:33.067: %FW-6-LOG_SUMMARY: 1 packet were dropped from 192.168.56.102:48965 => 192.168.10.100:443
(target:class)-(Outside-to-VLAN-10-DMZ:class-default)
*Feb 20 18:52:33.071: %FW-6-LOG_SUMMARY: 1 packet were dropped from 192.168.56.102:55985 => 192.168.10.100:587
(target:class)-(Outside-to-VLAN-10-DMZ:class-default)
*Feb 20 18:52:33.071: %FW-6-LOG_SUMMARY: 2 packets were dropped from 192.168.56.102:46872 => 192.168.10.100:25
(target:class)-(Outside-to-VLAN-10-DMZ:class-default)
*Feb 20 18:52:33.071: %FW-6-LOG_SUMMARY: 2 packets were dropped from 192.168.56.102:36496 => 192.168.10.100:110
!
truncated for brevity
!
Now that that's all setup, we'll add an Inside zone and some rules to allow us to SSH into our WebServer from the Inside for management, but drop any attempts to do so from the Outside. First, we'll build SSH access on our router masquerading as a WebServer.

WebServer1(config)#ip domain-name LabCorp.net
WebServer1(config)#username admin password root
WebServer1(config)#aaa new-model
WebServer1(config)#crypto key gen rsa
The name for the keys will be: WebServer1.LabCorp.net
Choose the size of the key modulus in the range of 360 to 2048 for your
  General Purpose Keys. Choosing a key modulus greater than 512 may take
  a few minutes.
How many bits in the modulus [512]: 1024
% Generating 1024 bit RSA keys, keys will be non-exportable...[OK]
WebServer1(config)#
*Mar  1 00:28:24.143: %SSH-5-ENABLED: SSH 2.0 has been enabled
WebServer1(config)#ip ssh ver 2
WebServer1(config)#line vty 0 935
WebServer1(config-line)#password root
WebServer1(config-line)#transport input ssh
WebServer1(config)#service password-encrypt
WebServer1(config)#no service password-encrypt

Just for fun, I added "match protocol ssh" to the class-map on the Edge-7200 and attempted to SSH in from the my VM on the Outside to WebServer1:

capn@capn-VirtualBox:~$ ssh -l admin 172.31.122.3
admin@172.31.122.3's password:
WebServer1>exit
Connection to 172.31.122.3 closed

 Ok, we'll tear that match off the class-map now aaaand...just as we hoped, we can't SSH into WebServer1 from our Outside VM anymore!

capn@capn-VirtualBox:~$ ssh -l admin 172.31.122.3
^C
capn@capn-VirtualBox:~$

And look what appeared on our Edge-7200!

*Feb 19 16:20:31.435: %FW-6-DROP_PKT: Dropping tcp session 192.168.56.102:41216 192.168.10.100:22 on zone-pair
Outside-to-VLAN-10-DMZ class class-default due to  DROP action found in policy-map with ip ident 0
Edge-7200#
*Feb 19 16:20:38.075: %FW-6-LOG_SUMMARY: 3 packets were dropped from 192.168.56.102:41216 => 192.168.10.100:22 (target:class)-(Outside-to-VLAN-10-DMZ:class-default)

For implementing our Inside with some simplicity, we'll hang our Inside LAN zone off of fa1/1 (10.1.1.1) on our Edge-7200 with an MLS and put fa1/1 of our 7200 in OSPF area 0, fa1/0.10 (192.168.10.1) of our 7200 (the gateway for our VLAN 10 WebServer1 DMZ) as a passive-int in area 0 as well, and then our Inside LAN as part of area 0 as well. It's not included below because, when I originally wrote this, I was planning on using a host connection, but the 192.168.1.0/24 off of LAN-Switch fa0/1 for AdminTestHost was also put into OSPF Area 0 as a passive-int. It's just not included in the RIB output below because that was done later. For the sake of information, LAN-Switch is advertising 192.168.1.0/24 into OSPF so WebServer traffic hitting the 7200 can find its way back to the AdminTestHost.


Let's check our RIB on our LAN-Switch:

LAN-Switch#sh ip route
Gateway of last resort is 10.1.1.1 to network 0.0.0.0
O*E2  0.0.0.0/0 [110/1] via 10.1.1.1, 00:11:58, Ethernet1/1 <--used "default-info originate always" on 7200
      10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C        10.1.1.0/30 is directly connected, Ethernet1/1
L        10.1.1.2/32 is directly connected, Ethernet1/1
O     192.168.10.0/24 [110/11] via 10.1.1.1, 00:10:24, Ethernet1/1

We can ping DMZ gateway fa1/0.10 (192.168.10.1) and Outside fa0/0 (172.31.122.2) of our 7200 from the LAN -- this is because traffic destined to a zone's interface itself won't be inspected/dropped as part of the ZBF policy. I have a suspicion that we won't be able to hit WebServer1 without a zone-pair built, and we definitely won't be able to hit R1's fa0/0 (172.31.122.1) without a NAT pool and a zone-pair.

LAN-Switch#ping 192.168.10.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 39/40/41 ms
LAN-Switch#ping 172.31.122.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.31.122.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 40/40/42 ms
LAN-Switch#ping 192.168.10.100
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.100, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
LAN-Switch#ping 172.31.122.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.31.122.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Let's start by building the NAT pool. We'll just permit 10.1.1.0/30 to start, because our test pings from the LAN-Switch are going to be sourced from 10.1.1.2 anyways. We'll add to the ACL so the rest of Inside can get Outside later.

Edge-7200(config)#access-list 1 permit 10.1.1.0 0.0.0.3
Edge-7200(config)#ip nat inside source list 1 pool Inside-LAN-NAT-Pool overload
Edge-7200(config)#ip nat pool Inside-LAN-NAT-Pool 172.31.122.4 172.31.122.4 netmask 255.255.255.248
Edge-7200(config)#int fa1/1
Edge-7200(config-if)#ip nat inside

Ok, the NAT (or rather, PAT) pool is built. Just to prove that it's not enough to setup NAT and that we need the zone-pair as well, let's try to ping 172.31.122.1 once more:

Success rate is 0 percent (0/5)
LAN-Switch#ping 172.31.122.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.31.122.1, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Yup, no luck. Let's get to building that new Inside to Outside zone-pair. Here's the additional config added. We'll start with just matching icmp for the sake of proving pings. For the sake of demonstrating class-map match-all inspects with an access-group, later we'll setup all of Inside LAN with the ability to open HTTP sessions to WebServer1, but only an admin subnet (192.168.1.0/24) with the ability to SSH from Inside to WebServer1 in the DMZ.


class-map type inspect match-any InsideToInternet
 match protocol icmp
!
!
policy-map type inspect Inside-to-Outside
 class type inspect InsideToInternet
  inspect
 class class-default
  drop log
!
zone security Outside
 description Outside to Internet
zone security Inside
 description Inside to LabCorp LAN
!
zone-pair security Inside-LAN-to-Outside source Inside destination Outside
 service-policy type inspect Inside-to-Outside
!
!
interface FastEthernet1/1
 ip address 10.1.1.1 255.255.255.252
 ip nat inside
 ip virtual-reassembly
 zone-member security Inside


Now, let's see if we can ping from our Inside LAN Switch to the Outside R1's fa0/0 and loopback1:

LAN-Switch#ping 172.31.122.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.31.122.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 21/37/68 ms
LAN-Switch#ping 216.36.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 216.36.1.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 19/28/52 ms

Inside to Outside access is good, so now let's get some access for users in our Inside LAN to our DMZ WebServer1, and we'll build that exclusive admin subnet SSH management access I mentioned earlier!

Edge-7200(config)#class-map type inspect match-any Inside-to-WebServer1-DMZ
Edge-7200(config-cmap)#match protocol icmp
Edge-7200(config-cmap)#match protocol http
Edge-7200(config-cmap)#match protocol https
Edge-7200(config)#class-map type inspect match-any Inside-Admins-Traffic
Edge-7200(config-cmap)#match protocol ssh
Edge-7200(config-cmap)#match protocol icmp
Edge-7200(config-cmap)#match protocol https
Edge-7200(config)#class-map type inspect match-all Inside-Admins-to-WebServer1-DMZ
Edge-7200(config-cmap)#match class-map Inside-Admins-Traffic
Edge-7200(config-cmap)#match access-group name AdminSubnet
Edge-7200(config)#ip access-list extended AdminSubnet
Edge-7200(config-ext-acl)#permit ip 192.168.1.0 0.0.0.255 host 192.168.10.100
Edge-7200(config)#policy-map type inspect Inside-to-WebServer1-DMZ
Edge-7200(config-pmap)#class type inspect Inside-to-WebServer1-DMZ
Edge-7200(config-pmap-c)#inspect
Edge-7200(config-pmap)#class type inspect Inside-Admins-to-WebServer1-DMZ
Edge-7200(config-pmap-c)#inspect
Edge-7200(config-pmap)#class class-default
Edge-7200(config-pmap-c)#drop log
Edge-7200(config)#zone-pair security Inside-to-WebServer1-DMZ source Inside destination VLAN10-WebServer1-DMZ
Edge-7200(config-sec-zone-pair)#service-policy type inspect Inside-to-WebServer1-DMZ

NOTE: Not sure if you have any background with class-maps, but the purpose of using "match-all" on the third map above, rather than the "match-any" that we've been using thus-far, is that "match-any" is an OR logic, whereas "match-all" is an AND logic. In order for the traffic to qualify for this class-map, not only does it need to be of one of the protocols we specified in "match class-map Inside-Admins-Traffic" for SSH/ICMP/HTTPS, the traffic also needs to match our Extended ACL which only permits traffic sourced from the management subnet (192.168.1.0/24) destined for the webserver (192.168.10.100).

First, let's start by pinging from our Inside router R3 to verify basic, non-admin, Inside traffic to WebServer1

R3#ping 192.168.10.100
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.100, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 28/39/68 ms

Looks good. Now, let's see if we can SSH to it from non-admin Inside

R3#ssh -l admin 192.168.10.100
R3# 

And we can confirm it failed on our Edge-7200

Edge-7200#
*Feb 20 20:15:30.355: %FW-6-DROP_PKT: Dropping ssh session 10.1.1.2:13309 192.168.10.100:22 on zone-pair Inside-to-WebServer1-DMZ class class-default due to  DROP action found in policy-map with ip ident 0
Edge-7200#
*Feb 20 20:15:33.067: %FW-6-LOG_SUMMARY: 2 packets were dropped from 10.1.1.2:13309 => 192.168.10.100:22 (target:class)-(Inside-to-WebServer1-DMZ:class-default)
Let's try it from our AdminTestHost (R4) whose interface is 192.168.1.42

AdminTestHost#ssh -l admin 192.168.10.100
[Connection to 192.168.10.100 aborted: error status 0]
AdminTestHost#ping 192.168.10.100
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.100, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 32/56/92 ms

Well, we can ping it...and the ZBF config is good...maybe we need to enable SSHv2 on our "AdminTestHost" which is actually just a router masquerading as a host. Spoiler: Yes, that was the case. Jumped through the hoops to get SSHv2 enabled on our AdminTestHost aaaaand...

AdminTestHost#ssh -l admin 192.168.10.100
Password:
WebServer1>

Success! We can officially:


  • Send non-admin traffic from Inside to WebServer1 in the DMZ, but not SSH to it
  • Send admin traffic from Inside to WebServer1 in the DMZ and SSH to it
  • Send Inside traffic to the Outside Internet
  • Nmap from the Outside to our WebServer1 in the DMZ to verify that only HTTP (port 80) is open


Below is our final ZBF config we built on our Edge-7200 to allow all this to be possible, and what a hell of a lab it's been. Go get yourself a scotch and celebrate success! Thanks for sticking with it to the end.

class-map type inspect match-any Inside-Admins-Traffic
 match protocol ssh
 match protocol icmp
 match protocol https
class-map type inspect match-all Inside-Admins-to-WebServer1-DMZ
 match class-map Inside-Admins-Traffic
 match access-group name AdminSubnet
class-map type inspect match-any DMZ-to-Outside
 match protocol icmp
 match protocol http
class-map type inspect match-any Inside-to-WebServer1-DMZ
 match protocol icmp
 match protocol http
 match protocol https
class-map type inspect match-any InsideToInternet
 match protocol icmp
!
!
policy-map type inspect VLAN-10-DMZ-to-Outside
 class type inspect DMZ-to-Outside
  inspect
 class class-default
  drop log
policy-map type inspect Inside-to-WebServer1-DMZ
 class type inspect Inside-to-WebServer1-DMZ
  inspect
 class type inspect Inside-Admins-to-WebServer1-DMZ
  inspect
 class class-default
  drop log
policy-map type inspect Inside-to-Outside
 class type inspect InsideToInternet
  inspect
 class class-default
  drop log
!
zone security VLAN10-WebServer1-DMZ
 description Security zone for WebServer1
zone security Outside
 description Outside to Internet
zone security Inside
 description Inside to LabCorp LAN
zone-pair security Outside-to-VLAN-10-DMZ source Outside destination VLAN10-WebServer1-DMZ
 service-policy type inspect VLAN-10-DMZ-to-Outside
zone-pair security Inside-LAN-to-Outside source Inside destination Outside
 service-policy type inspect Inside-to-Outside
zone-pair security Inside-to-WebServer1-DMZ source Inside destination VLAN10-WebServer1-DMZ
 service-policy type inspect Inside-to-WebServer1-DMZ
!
!
interface FastEthernet0/0
 description Outside
 ip address 172.31.122.2 255.255.255.248
 ip nat outside
 ip virtual-reassembly
 zone-member security Outside
 duplex full
 !
!
interface FastEthernet1/0.10
 description VLAN10-WebServer-1-DMZ
 encapsulation dot1Q 10
 ip address 192.168.10.1 255.255.255.0
 ip nat inside
 ip virtual-reassembly
 zone-member security VLAN10-WebServer1-DMZ
 ip ospf 1 area 0
!
interface FastEthernet1/1
 ip address 10.1.1.1 255.255.255.252
 ip nat inside
 ip virtual-reassembly
 zone-member security Inside
 ip ospf authentication message-digest
 ip ospf message-digest-key 1 md5 LabCorpOSPF
 ip ospf 1 area 0
 duplex auto
 speed auto

Wednesday, February 18, 2015

Checking in, and fan-boying for Ivan

Hey, now.  Don't you worry.  I haven't forgotten about The Guide.

As it happens, I've been perusing Ivan Pepelnjak's digital shortcut, Deploying Zone-Based Firewalls.  "Why?" you might currently be screaming at the top of your lungs.  Well, partially because the man is quite brilliant, but mostly because ZBFs are suspiciously simple to implement...and because I've found that emulating ASAs in Qemu on GNS3 has been nothing short of abysmal.

Qemu bashing aside, I've just finished making final tweaks to my latest lab.  It's a fairly straight-forward Inside/DMZ/Outside zone design, but it's a good base configuration that's easy to extrapolate and build upon -- oh, and it takes into account that, unlike in Ivan's book (which was written using IOS 12.4, I believe), I'm using 15.0 on my edge router, which doesn't allow the use of inspect with class-default on the zone-pair's policy-map.

So, fun stuff coming down the pipe! If the ZBF lab isn't up by tomorrow, look for it on Friday.

Thursday, February 12, 2015

STP Diameter and the Art of -- Wait...Are those daisy-chained?!

Well, things have finally calmed down a bit, leaving me enough time to sift through the notes I had left myself for drafting this post.  I'll say it now, this one is a bit more theory-related.  I've encountered a topology or two that had haphazardly chained switches, but, for the purposes of demonstrating STP diameter, we'll be using an indescribably atrocious environment that one would be hard-pressed to encounter in the field.

There is a general rule that floats around networking forums -- particularly Cisco's -- which states that a network with an STP diameter of 7+ can be a risk for instability and unexpected re-convergence.  If you haven't heard of the Beth Israel Deaconess Medical Center meltdown then, please, allow me to summarize:

In 2002 -- I note the year just because 802.1w came out in 2001, but maybe BIDMC had cold feet about the transition -- the infrastructure at BIDMC came to a full stop due to layer 2 instability.  After what was described as extensive work with Cisco TAC and engineers who flew in, it was discovered that BIDMC had exceeded an STP diameter of 7.

Going forward, keep in mind that, maybe by the standards of over a decade ago, 7 was unfeasible and possibly even downright impossible, but, today (and as we'll see below) we can get away with a bit more carelessness.

Recall that STP has three major timers;

  • Hello = The amount of time between each Hello BPDU sent between switches. 2 seconds, by default, but anywhere from 1-10 seconds.
  • FWD Delay = The amount of time spent in the Listening and Learning states, respectively, before transitioning; 15 seconds, by default, but anywhere from 4-30 seconds.
  • Max Age = The max length of time that can pass before a switch saves its Config BPDU info. 20 seconds, by default, but anywhere from 6-40 seconds.

Max Age can only be reset by the receiving of a new superior BPDU which changes the local bridge's view on how to best reach root.

Each Config BPDU contains these 3 values, but, in addition, contains a little known bonus timer called "Message Age." MSG Age isn't a fixed value. The Root sends all BPDUs with MSG Age=0 and subsequent non-roots that receive the BPDU increment MSG Age by 1 then relay it. Effectively, MSG Age represents how far you are from the Root upon receiving the BPDU.

When a BPDU arrives that is superior (better BID (MAC address + Bridge Priority)) to the current BPDU received from the current Root, the new, superior BPDU is stored and the Age Timer starts to increment, beginning at a value equal to the MSG Age received in the superior BPDU. If the Age Timer reaches a value equal to the switch's Max Age Timer before another BPDU is received from the Root (remember, the ROOT is always sending Hello BPDUs at a default of 2 sec) then the Age Timer doesn't refresh and the superior BPDU is aged out.

You can see what a problem this might cause with larger STP diameters...

Because we're working in a virtual infrastructure, say we've really let our network burn to the ground, and 18 switches are daisy chained from ROOT to furthest Access Switch. So our diameter = 18.

A majestic, swirling vortex of fail.

Recall that the Root originates its BPDU with MSG AGE = 0, then each switch increases MSG Age by 1 as it relays, so, at the far-end switch...MSG Age = 17 upon receiving the BPDU from the Root.

So, our Age Timer starts at 17 seconds, and we have 3 seconds of hold time (Max Age - MSG Age) before Max Age expires and the superior BPDU is discarded. By default, our Root will only re-send BPDUs every 2 seconds, so, assuming decent line speed/no link saturation, let's say it takes 1 second for the far-end switch to receive the superior BPDU and refresh its Age Timer.

Let's see how the MSG Age looks as we move down the chain, starting on the Root.

Root Switch in the daisy chain originating Superior BPDU:

Root-SW1#sh spanning-tree vlan 1 detail
 VLAN0001 is executing the ieee compatible Spanning Tree protocol
  Bridge Identifier has priority 24576, sysid 1, address aabb.cc00.0100
  Configured hello time 2, max age 20, forward delay 15
  We are the root of the spanning tree
  Topology change flag not set, detected flag not set
  Number of topology changes 2 last change occurred 00:02:00 ago
          from Ethernet0/0
  Times:  hold 1, topology change 35, notification 2
          hello 2, max age 20, forward delay 15
  Timers: hello 0, topology change 0, notification 0, aging 300
 Port 1 (Ethernet0/0) of VLAN0001 is designated forwarding
!truncated for brevity!
   Timers: message age 0

Now, the second switch in the daisy chain receiving the Superior BPDU:

SW2#sh spanning-tree vlan 1 detail | incl message age
Timers: message age 1, forward delay 0, hold 0

We'll see the MSG Age timer increment as the switch waits for the next superior BPDU to arrive:

SW2#sh spanning-tree vlan 1 detail | incl message age
Timers: message age 2, forward delay 0, hold 0

It will drop back to "message age 1" as, by the time the Hello arrived, set the message age to 1, and the next Hello left the Root and arrived, MSG Age will have incremented to 2 (possibly on its way to 3).


Here's the third switch in the daisy chain receiving the superior BPDU:

SW3#sh spanning-tree vlan 1 detail | incl message age
Timers: message age 2, forward delay 0, hold 0

It starts at 2 rather than 1 because the BPDU's Message Age was incremented from 0 to 1 upon being received on SW2 then relayed from SW2 with Message Age = 1, which SW3 then incremented to 2 upon receiving it. It increments towards Max Age as it waits for a superior BPDU to arrive to refresh MSG Age.

SW3#sh spanning-tree vlan 1 detail | incl message age
Timers: message age 3, forward delay 0, hold 0

For the sake of brevity, we'll skip down the chain to SW6 and check its MSG Age timers:

SW6#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 5, forward delay 0, hold 0
SW6#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 6, forward delay 0, hold 0

Now, all of this works nicely because, as you can see it takes maybe 1 second at most for the Root to get its superior BPDU down to the furthest switch and refresh the MSG Age timer. We probably won't see any issues unless we get our diameter up near 19 or 20. Let's see what happens!

As we get up on our 13th switch in the chain, we can see that the BPDU starts to take longer to arrive/be processed:

SW13#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 12, forward delay 0, hold 0
SW13#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 13, forward delay 0, hold 0
SW13#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 14, forward delay 10, hold 0

The superior BPDU that arrived on SW should have a MSG Age of 12, so it's taking a full 2 seconds before the Root's new BPDU can arrive and be processed by SW13 to refresh the MSG Age timer.

Let's skip ahead to our 18th switch

SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 17, forward delay 0, hold 0
SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 18, forward delay 0, hold 0
SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 19, forward delay 0, hold 0
SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 20, forward delay 0, hold 0
SW18#sh spanning-tree vlan 1 detail | incl message age
   Timers: message age 0, forward delay 0, hold 0

Yup, this is exactly the sort of thing that wreaks havoc on a network (though, in all fairness, if your network has an STP diameter bad enough where your MSG Age is starting at 17, you're probably already waist-deep in havoc). Notice how the age timer starts at 17 (as it should, since Root originates it at 0 and we're 18 switches deep), but it takes a full 3 seconds for the superior BPDU to reach switch 18.

Since our Max Age was reached before the Hello could be received/processed, the current superior BPDU was discarded, MSG Age resets to 0, and our far-end switch is now undergoing a topology change where it first assumes itself to be the Root:

SW18#sh spanning-tree vlan 1
VLAN0001
  Spanning tree enabled protocol ieee
  Root ID    Priority    32769
             Address     aabb.cc00.1200
             This bridge is the root

Keep in mind, this simulation assumes a topology with no production traffic crossing it, no actual cables (so no chance of EMI/bends/general CRC causes), and really no overhead on the switches at all. It's entirely possible that we could have seen this on switch 10 or 11. One should always take into consideration the propensity for lost Hello BPDUs and, in a worst case scenario, how long the end-to-end BPDU propagation truly is, given the number of lost Hellos, the frequency with which Hellos go out (Hello Timer), BPDU Delay (amount of time it takes a switch to receive a BPDU then relay it (1 second max), and your STP diameter.


Cisco has a nice little algorithm providing that propagation delay:
End-to-end_BPDU_propa_delay
= ((lost_msg + 1) x hello) + ((BPDU_Delay x (dia – 1))
= ((3 + 1) x hello) + ((1 x (dia – 1))
= 4 x hello + dia – 1
= 4 x 2 + 6
= 14 sec

Once you have propagation delay, you can also make sense of why Max Age is 20 seconds by default:
max_age
= End-to-end_BPDU_propa_delay + Message_age_overestimate
= 14 + 6
= 20 sec

Where "Message_age_overestimate" accounts for the age of the BPDU since origination by the Root and 1 second incrementing by each relaying non-root:
Message_age_overestimate
= (dia – 1) x overestimate_per_bridge
= dia – 1
= 6

Once we take cable faults, switching delays, and general network overhead into account, 7 starts to look like a more fault-tolerant STP diameter limit.


So, there you have it! STP diameter and why anyone in their right mind has long since migrated to 802.1w.  When I landed my first networking gig, you can imagine my surprise when I saw a few daisy-chained switches. While certainly not conducive to scalability and redundancy, it wasn't the end of the world, but it did merit a change control submission to move to something less...let's call it "horrifying."