Cisco
Cisco StackWise Virtual Deep Dive: Campus Core Design for 2026
StackWise Virtual (SVL) is the definitive successor to the Virtual Switching System (VSS) for building resilient, high-bandwidth enterprise campus distribution and core layers. While the concept of pairing two chassis into a single logical switch is not new, implementing SVL on modern Catalyst 9500 and 9600 series hardware demands a nuanced understanding of its underlying mechanics, particularly the StackWise Virtual Link (SVL), dual-active detection, and In-Service Software Upgrade (ISSU) processes. A successful SVL deployment moves beyond a simple "plug-and-play" mentality and requires precise engineering decisions that have significant implications for fabric stability and performance.
StackWise Virtual vs. VSS and Backplane Stacking
It is critical to differentiate SVL from its predecessors and its Catalyst 9200/9300 series cousins. VSS, pioneered on the Catalyst 6500 series, required identical chassis, specific supervisor modules (like the VS-S2T-10G), and used the physical port-channels as the Virtual Switch Link (VSL). SVL on the Catalyst 9500/9600 inherits this philosophically but implements it on the modern UADP ASIC architecture with IOS XE.
Conversely, traditional StackWise, as seen on the Catalyst 9300 Series with StackWise-1T, uses proprietary backplane cables (e.g., STACK-T1) to connect multiple switches in a ring topology. This creates a single logical switch with a unified data plane and control plane, sharing a single IP address. SVL achieves the same logical outcome but uses standard 10/25/40/100G Ethernet interfaces for the interconnect, known as the StackWise Virtual Link. This distinction is paramount: SVL is for pairs of high-performance distribution/core switches, while StackWise is for stacking multiple access-layer switches in a wiring closet.
Core Hardware Platform Selection: Catalyst 9500 vs. 9600
The choice between the fixed-configuration Catalyst 9500 Series and the modular Catalyst 9600 Series for an SVL pair depends entirely on port density, future scalability, and budget. Both are built on Cisco's UADP ASICs, ensuring feature parity for core campus functions.
Catalyst 9500: High-Performance Fixed Core
The Catalyst 9500 Series, particularly the high-performance models, are ideal for compact core or distribution blocks. A common pairing is two C9500-48Y4C switches, providing 48 ports of 1/10/25G SFP28 and 4 ports of 40/100G QSFP. For higher-performance needs, the C9500-32C offers 32 ports of 40/100G QSFP28. These platforms, running on UADP 3.0, provide significant TCAM and buffer resources suitable for most enterprise core requirements. Their fixed nature means what you buy is what you get; future expansion involves a chassis replacement, not a line-card addition.
Catalyst 9600: Modular Scale for Large Campuses
For large enterprise or university campuses, the Catalyst 9606R chassis with a pair of C9600-SUP-1 supervisors is the logical choice. The modularity allows for a mix of line cards, such as the C9600-LC-48YL (48 ports of 1/10/25G) and the C9600-LC-24C (24 ports of 40/100G). This enables a pay-as-you-grow model and the ability to adopt future technologies, like 400G, via a new line card rather than a full system forklift. An SVL pair of 9606R chassis provides the ultimate in resiliency and scale, capable of supporting thousands of users and devices.
Sizing the StackWise Virtual Link (SVL)
The SVL is the most critical component of the design. It carries all control plane communication and any data traffic that must traverse between the two chassis (i.e., "inter-chassis traffic"). Under-sizing the SVL starves the fabric of bandwidth and can lead to unpredictable performance, while over-sizing wastes expensive high-speed ports. A pragmatic approach to sizing is essential.
Consider a distribution block built on an SVL pair of C9500-32C switches. This block serves ten access-layer stacks of Catalyst 9300s, each with dual 40G uplinks configured in a Multi-chassis EtherChannel (MEC) to the SVL pair. Total uplink capacity is 10 * (2 * 40 Gbps) = 800 Gbps. A common rule of thumb is to provision the SVL with a capacity equal to 25-50% of the total connected uplink bandwidth, assuming an even distribution of traffic termination across both chassis.
Example Calculation:
- Total Uplink Bandwidth: 800 Gbps
- Target SVL Capacity (30% rule): 800 Gbps * 0.30 = 240 Gbps
- Implementation: A three-port port-channel of 100G interfaces (3 x 100 Gbps = 300 Gbps).
On each C9500-32C, you would provision ports HundredGigE1/0/30, 1/0/31, and 1/0/32 for the SVL. This provides 300 Gbps of bandwidth, exceeding the 240 Gbps target and offering N+1 redundancy; the failure of a single link still leaves 200 Gbps of capacity. Running mission-critical data and control traffic over a single link, even a 100G one, is a design antipattern. Always use a port-channel with at least two members for the SVL.
Dual-Active Detection: The Ultimate Safety Net
A dual-active scenario is the most catastrophic failure state for a virtualized switching pair. It occurs when the SVL fails, and both switches believe they are the active chassis. This leads to IP address duplication, MAC address table instability, and network-wide loops. SVL employs a multi-layered detection mechanism, an evolution from VSS, to prevent this.
- SVL Keepalives: The primary method. Cisco-proprietary headers on SVL traffic include keepalives. If these are not received for the configured timer period (defaults are aggressive, in the hundreds of milliseconds), the secondary switch initiates a failover sequence.
- Fast-Hello: This is a mandatory, out-of-band link. It involves a direct layer-2 connection (e.g., a single 1G SFP port on each switch, like Gi1/0/1 <--> Gi1/0/1) dedicated to sending UDP keepalive packets. If the SVL goes down but the Fast-Hello keepalives are still received, the secondary switch knows the primary is still alive and immediately enters recovery mode, shutting down all its front-panel ports (except the SVL) to prevent loops.
- Enhanced PAgP (ePAgP) / MEC: This is the final tie-breaker and a significant improvement over VSS. If both the SVL and Fast-Hello links fail simultaneously (e.g., a multi-card failure or intermediate switch failure), the SVL pair can use a downstream switch to communicate. A special PAgP TLV on the Multi-chassis EtherChannel (MEC) informs each SVL member of the other's existence. If a switch sees itself and its peer in the PAgP messages from a downstream device, but cannot reach the peer over the SVL, it knows a dual-active condition exists.
Common Pitfall: The Shared Fast-Hello Link
A common mistake is to run the Fast-Hello link through another piece of network equipment, like a management switch, to save fiber. This is a critical design flaw. If that intermediary switch reboots or fails, both SVL members lose Fast-Hello connectivity, triggering a false positive. The Fast-Hello link MUST be a direct, dedicated connection between the two SVL chassis—ideally a single strand of fiber or a DAC cable if co-located. Skimping here undermines the entire high-availability design.
When NOT to Use StackWise Virtual
SVL is a powerful tool, but it is not the universal solution for all campus designs. Applying it in the wrong context creates more problems than it solves.
- EVPN-VXLAN Fabrics: For advanced campus fabrics employing an EVPN-VXLAN overlay with a BGP control plane, SVL is the wrong model. A true spine-leaf (CLOS) architecture relies on individually routed L3 links between leaf and spine switches. Each switch is an independent routing entity. SVL, by contrast, creates a large L2 domain with a single point of control, which is antithetical to the principles of a routed fabric.
- Geographically Dispersed Cores: Attempting to "stretch" an SVL pair between different buildings or data centers over DWDM or dark fiber is a dangerous practice. The SVL protocol is extremely sensitive to latency and jitter. Any latency above a few milliseconds (Cisco's official limit is often cited based on optic reach, but 5ms is a safe practical ceiling) can cause instability in the control plane communication. A failure of the long-haul link could lead to a dual-active condition that is difficult to troubleshoot. The failure domain becomes unacceptably large.
- Small Deployments: For a simple top-of-rack or wiring closet scenario with 2-4 switches, a full-blown SVL pair of Catalyst 9500s is overkill. A stack of Catalyst 9300s using the dedicated StackWise-1T backplane cables is far more cost-effective, simpler to configure, and provides a unified management plane in a more appropriate form factor.
ISSU and Stateful Switchover (SSO)
In-Service Software Upgrade (ISSU) is a key benefit of SVL, allowing for software updates with minimal traffic disruption. However, its "hitless" nature is often misunderstood. The process, governed by Stateful Switchover (SSO), works by upgrading the standby chassis first. It reloads with the new code while the active chassis continues to forward traffic. Once the standby is back online and synchronized, a manual or automatic switchover is triggered. The newly active chassis, running the new code, takes over traffic forwarding. The former active (now standby) chassis is then upgraded.
This process maintains the data plane—forwarding continues based on existing FIB and adjacency tables. However, the control plane undergoes a reset. Routing protocols like OSPF and EIGRP, when configured for Non-Stop Forwarding (NSF), will perform a graceful restart, preventing neighbor flaps. BGP sessions will also gracefully restart. However, it is not a truly "invisible" event. The process requires IOS XE 17.3.2 or later and strict adherence to the compatibility matrix for the source and destination versions. Always perform ISSU in a planned maintenance window with a documented rollback procedure.
Migrating from Catalyst VSS to StackWise Virtual
Many enterprises are facing the end-of-life of their Catalyst 6500/6800 VSS cores. Migrating to a Catalyst 9500/9600 SVL pair can be performed with minimal downtime using a swing migration approach.
- Pre-Staging: Rack and stack the new SVL pair. Configure the SVL, Fast-Hello, all SVIs, routing protocols, ACLs, and QoS policies. Keep the SVIs in a shutdown state. The new MEC port-channels facing the access layer should be configured but will be empty.
- Swing Uplinks (Maintenance Window): For each downstream switch or stack connected to the VSS pair via MEC, perform the swing. Let's say a Cat 9300 stack has one uplink to VSS-Active and one to VSS-Standby.
- Move the uplink from the VSS-Standby chassis to a port on the new SVL-Standby chassis. The port-channel on the Cat 9300 will now have one member link down and one up (to VSS-Active).
- Move the uplink from the VSS-Active chassis to a port on the new SVL-Active chassis. The port-channel on the Cat 9300 will now be fully connected to the new SVL core. Traffic is still not flowing as the SVIs are down.
- Routing Cutover: This is the service-impacting step. On the new SVL core, execute a `no shutdown` on all the core SVIs. Simultaneously, `shutdown` all SVIs on the old VSS core. This forces all routed traffic to flow through the new fabric. Peer routing adjacencies will drop from the old core and reform with the new one.
- Decommission: After a monitoring period to ensure stability, power down and remove the legacy VSS chassis.
StackWise Virtual, when implemented on the robust Catalyst 9500 and 9600 platforms, offers a formidable solution for building the next generation of enterprise campus networks. Its strength lies not in the simplicity of its concept, but in the detailed execution of its supporting components: a properly sized SVL, a physically isolated Fast-Hello link, and correctly configured MECs. By avoiding common pitfalls and respecting its design boundaries, network engineers can build a core and distribution fabric that delivers the stability and performance required through 2026 and beyond. To discuss a personalized migration plan or architectural review for your campus, contact the experts at techleague.io.
For more on campus design, see our posts on EVPN-VXLAN as a Campus Fabric and a performance review of the Catalyst 9300X with StackWise-1T.
Frequently asked questions
Can you mix different Catalyst 9500 models in a StackWise Virtual pair?+
No. The two switches in a StackWise Virtual domain must be of the exact same model number (PID), running the same IOS XE version and the same license level (e.g., DNA Advantage). This ensures complete feature and performance parity between the chassis.
What is the maximum distance for a StackWise Virtual Link (SVL)?+
The physical distance is dictated by the optic used (e.g., 100G-LR4 allows for 10km). However, the real constraint is latency. SVL control plane protocols are designed for near-zero latency, and a practical, safe limit is under 5 milliseconds. It is strongly recommended to keep both chassis within the same data center or communications room.
How does StackWise Virtual handle Quality of Service (QoS)?+
QoS markings (DSCP, CoS) are preserved when frames transit the SVL. However, queuing, policing, and shaping policies are applied on a per-chassis basis. This means traffic ingressing Switch 1 is subject to Switch 1's egress queuing policies, even if its ultimate destination is a port on Switch 2.
How is licensing handled for a StackWise Virtual pair?+
Both switches must have an identical Cisco DNA Subscription license level (e.g., Essentials, Advantage, or Premier). A single DNA Center can manage the pair as one logical entity, but the hardware and software licenses are purchased and applied on a per-chassis basis.
Can I bundle 40G and 100G ports into the same SVL?+
No. All member links within the port-channel designated as the StackWise Virtual Link must be of the same speed. You cannot mix 40G and 100G interfaces in the same bundle. You must choose one speed and provision multiple interfaces of that speed for capacity and redundancy.
What specifically happens from the secondary's perspective in a dual-active scenario?+
If the SVL fails and the Fast-Hello link also fails, the secondary switch assumes the primary is down and transitions to become active. However, if it then receives an ePAgP hello from a downstream switch that indicates the original primary is *also* active, it knows a dual-active condition exists. It immediately places all of its front-panel, non-SVL interfaces into an error-disabled state to prevent loops and protect the network.
Is a direct fiber connection truly mandatory for the Fast-Hello link?+
While it might function through an intermediate switch, it is a critical design violation. The Fast-Hello link's purpose is to be a simple, out-of-band check on the peer's liveliness. Running it through any other device introduces shared fate; if that device fails, you can trigger a false dual-active detection. A direct DAC cable or a single fiber strand is the only supported and architecturally sound design.