[rancid] Cisco 3650 IOS-XE active VLAN port state changes
Piegorsch, Weylin William
weylin at bu.edu
Mon Nov 25 18:45:31 UTC 2019
> I would like to understand why this occurs for some folks
The hardware fault, I still can’t explain what's happening.
The other issue that I encountered was with native VLAN tagging. Um... I'm not sure this is something that RANCiD is geared to tackle, but here goes anyway.
Depending on where you look for guidance, if you tag the native VLAN:
- global config on most switches: "vlan dot1q tag native"
- interface config on Cat6k (and possibly N7k, untested): "switchport trunk native vlan tag"
and also specify a native VLAN ("switchport trunk native vlan <x>"), different guidance tells you sometimes to or not-to add the native VLAN to the trunk. What I found is that if I tagged the native VLAN, used a non-default native VLAN, and didn’t include it, I got spurious behavior, but largely if I either had different native VLANs across the various different trunk ports or connected to a remote device whose adjacent interface was configured differently than the local device. If I tag the native VLAN ewverywhere and use a non-default native VLAN everywhere (the same native VLAN on both ends of a link), and also always include the native VLAN on the trunk, then it all worked ok.
What was the spurious behavior?
In the past, I've done some extensive independent research on native VLAN tagging, and found that VLAN 1 is always on a trunk - for some specific protocols - whether it's explicitly allowed or not (see posting I wrote on Cisco's community support forum, reference 1 below). TAC confirmed that if you use a native VLAN besides VLAN 1, then you should allow the native VLAN on the trunk regardless of the tagged state. I found across a lot of regression testing that results varied by HW model/SW version, but if you allow the native VLAN and configure both ends the same way, then things are stable.
Personally, I suspect STP Loopguard, but that's just a guess plus some interesting log messages. UDLD also had interesting log messages. I also saw some weird messages from CDP, LLDP, and (most-weirdly) even LACP on non-bundled interfaces, but I'm most suspect of STP and somewhat suspect of UDLD. Basically, I never got to the root issue of what protocol was causing RANCiD's issues, but I found that doing things "correctly" then RANCiD's problem went away.
Also, caution - if you’re trying for logic to determine if a VLAN is on a trunk or not, consider this scenario. All 5 commands on a single interface, what's the "native vlan"? (hint: 22, and an egressed frame is either untagged or in some cases tagged with VLAN 0)
switchport mode access
switchport access vlan 22
switchport trunk native vlan tag
switchport trunk native vlan 117
switchport trunk allowed vlan 102-128
I've had to redo VLAN definitions across entire datacenters, but I had a devil of a time finding the extent that VLANs were applied (shutdown ports; VLAN ranges; admin-up ports that had no SFP or were unpatched; etc). I had to write a script to help me out, see reference 2 below.
As for support in RANCiD - RANCiD's not built to organically be a configuration compliance validation and/or enforcement tool (though, it's got fantastic baseline to be useful as a launching point). The problem I had was a problem of configuration compliance, and a problem of fabric inter-relationships. Compliance can be built around RANCiD, but I'm concerned that if you build that in, you're really taking a big bite, especially if you're trying to be multi-vendor about it. Doing that would lead to the world of SDN. PRIME INFRASTRUCTURE and DNA-C are better tools for this (at least for Cisco kit), and if those are too expensive for a given network shop then developing and maintaining compliance scripts might be a better approach than building it directly into RANCiD.
Though, I would be thrilled if shrubbery.net had a compliance tool built around RANCiD.
Reference 1: https://community.cisco.com/t5/switching/does-the-native-vlan-need-to-be-allowed-on-the-trunk-port/td-p/1648181/page/2
Reference 2: https://community.cisco.com/t5/switching/identify-if-vlan-is-applied-to-switchports/td-p/3693599
weylin
On 11/25/19, 11:58 AM, "john heasley" <heas at shrubbery.net> wrote:
Sat, Nov 23, 2019 at 09:50:06PM +0000, Piegorsch, Weylin William:
> You can also develop a custom type that doesn't call "show vlan".
please do this, rather than change ios.pm. This makes it easier for you
to upgrade rancid, both of which i prefer because it is easier to support
you.
> Also, I've had this occur twice in the past.
> - One time was happening campus-wide. I dug into it hard, and after a good amount of effort found out there was something actually happening based on a misunderstanding I had about native VLANs work in IOS. In other words: (a) I learned something, and (b) I found I had an actual misconfiguration.
> - The other time it turned out that there was a hardware fault on the ASIC (we're actually still using that particular Catalyst 3508).
I would like to understand why this occurs for some folks and change
the code to automatically ignore show vlan output when the switch is
configured in a manner that would lead to it. I know that VTP does
this and sometimes 802.1x and the current code tries to recognize
both of these. tia for any help here.
More information about the Rancid-discuss
mailing list