Table Of Contents
Network Virtualization—Path Isolation Design Guide
Control Plane-Based Path Isolation
Network Device Virtualization with VRF
Data Path Virtualization—Single- and Multi-Hop Techniques
Path Isolation Initial Design Considerations
Deploying Path Isolation in Campus Networks
Path Isolation Using Distributed Access Control Lists
Path Isolation Leveraging Control Plan Techniques
Virtualizing the Campus Distribution Block
Virtualization of Network Services
Path Isolation Deploying VRF-Lite and GRE
Loopback Interfaces Deployment Considerations
High Availability Considerations
QoS in Hub-and-Spoke Deployments
Challenges and Limitations Using VRF and GRE
Path Isolation Deploying MPLS VPN
Path Isolation Deploying VRF-Lite End-to-End
Deploying VRF-Lite End-to-End in Campus Networks
High Availability Recovery Analysis
Extending Path Isolation over the WAN
Deploying Path Isolation Using Distributed ACLs
Deploying Path Isolation Using VRF-Lite and GRE
Deploying Path Isolation using VRF-Lite and MPLS VPN
Mapping Enterprise VRFs to Service Provider VPNs (Profile 1)
Multiple VRFs Over a Single Service Provider VPN (Profile Two)
Extending the Enterprise Label Edge to the Branch (Profile 3)
General Scalability Considerations
Appendix A—VRF-Lite End-to-End—Interfacing Layer 2 Trunks and Sub-Interfaces
Appendix B—Deploying a Multicast Source as a Shared Resource
Network Virtualization—Path Isolation Design Guide
February 23,2009
Introduction
The term network virtualization refers to the creation of logical isolated network partitions overlaid on top of a common enterprise physical network infrastructure, as shown in Figure 1.
Figure 1 Creation of Virtual Networks
Each partition is logically isolated from the others, and must provide the same services that are available in a traditional dedicated enterprise network. The end user experience should be as if connected to a dedicated network providing privacy, security, an independent set of policies, service level, and even routing decisions. At the same time, the network administrator can easily create and modify virtual work environments for various user groups, and adapt to changing business requirements adequately. The latter is possible because of the ability to create security zones that are governed by policies enforced centrally; these policies usually control (or restrict) the communication between separate virtual networks or between each logical partition and resources that can be shared across virtual networks. Because policies are centrally enforced, adding or removing users and services to or from a VPN requires no policy reconfiguration. Meanwhile, new policies affecting an entire group can be deployed centrally at the VPN perimeter. Thus, virtualizing the enterprise network infrastructure provides the benefits of using multiple networks but not the associated costs, because operationally they should behave like one network (reducing the relative OPEX costs).
Network virtualization provides multiple solutions to business problems and drivers that range from simple to complex. Simple scenarios include enterprises that want to provide Internet access to visitors (guest access). The stringent requirement in this case is to allow visitors external Internet access, while simultaneously preventing any possibility of unauthorized connection to the enterprise internal resources and services. This can be achieved by dedicating a logical "virtual network" to handle the entire guest communication path. Internet access can also be combined with connectivity to a subset of the enterprise internal resources, as is typical in partner access deployments.
Another simple driver for network virtualization is the creation of a logical partition dedicated to the machines that have been quarantined as a result of a Network Admission Control (NAC) posture validation. In this case, it is essential to guarantee isolation of these devices in a remediation segment of the network, where only access to remediation servers is possible until the process of cleaning and patching the machine is successfully completed.
Complex scenarios include enterprise IT departments acting as a service provider, offering access to the enterprise network to many different "customers" that need logical isolation between them. In the future, users belonging to the same logical partitions will be able to communicate with each other and to share dedicated network resources. However, some direct inter-communication between groups may be prohibited. Typical deployment scenarios in this category include retail stores that provide on-location network access for kiosks or hotspot providers.
The architecture of an end-to-end network virtualization solution targeted to satisfy the requirements listed above can be separated in the following three logical functional areas:
•
Access control
•
Path isolation
•
Services edge
Each area performs several functions and must interface with the other functional areas to provide the end-to-end solution (see Figure 2).
Figure 2 Network Virtualization Framework
The functionalities highlighted in Figure 2 are discussed in great detail in separate design guides, each one dedicated to a specific functional area.
•
Network Virtualization—Access Control Design Guide (http://www.cisco.com/en/US/docs/solutions/Enterprise/Network_Virtualization/AccContr.html)—Responsible for authenticating and authorizing entities connecting at the edge of the network; this allows assigning them to their specific network "segment", which usually corresponds to deploying them in a dedicated VLAN.
•
Network Virtualization—Services Edge Design Guide (http://www.cisco.com/en/US/docs/solutions/Enterprise/Network_Virtualization/ServEdge.html)—Central policy enforcement point where it is possible to control/restrict communications between separate logical partitions or access to services that can be dedicated or shared between virtual networks.
The path isolation functional area is the focus of this guide.
This guide mainly discusses two approaches for achieving virtualization of the routed portion of the network:
•
Policy-based network virtualization—Restricts the forwarding of traffic to specific destinations, based on a policy, and independently from the information provided by the control plane. A classic example of this uses ACLs to restrict the valid destination addresses to subnets in the VPN.
•
Control plane-based network virtualization—Restricts the propagation of routing information so that only subnets that belong to a virtual network (VPN) are included in any VPN-specific routing tables and updates. This second approach is the main core of this guide, because it allows overcoming many of the limitations of the policy-based method.
Various path isolation alternatives technologies are discussed in the sections of this guide; for the reader to make good use of this guide, it is important to underline two important points:
•
This guide discusses the implementation details of each path isolation technology to solve the business problems previously discussed, but is not intended to provide a complete description of each technology. Thus, some background reading is needed to acquire complete familiarity with each topic. For example, when discussing MPLS VPN deployments, some background knowledge of the technology is required, because the focus of the document is discussing the impact of implementing MPLS VPN in an enterprise environment, and not its basic functionality.
•
Not all the technologies found in this design guide represent the right fit for each business requirement. For example, the use of distributed access control lists (ACLs) or generic routing encapsulation (GRE) tunnels may be particularly relevant in guest and partner access scenarios, but not in deployments aiming to fulfill different business requirements. To properly map the technologies discussed here with each specific business requirement, see the following accompanying deployment guides:
•
Network Virtualization—Guest and Partner Access Deployment Guide— http://www.cisco.com/en/US/docs/solutions/Enterprise/Network_Virtualization/GuestAcc.html
•
Network Virtualization—Network Admission Control Deployment Guide— http://www.cisco.com/en/US/docs/solutions/Enterprise/Network_Virtualization/NACDepl.html
Path Isolation Overview
Path isolation refers to the creation of independent logical traffic paths over a shared physical network infrastructure. This involves the creation of VPNs with various mechanisms as well as the mapping between various VPN technologies, Layer 2 segments, and transport circuits to provide end-to-end isolated connectivity between various groups of users.
The main goal when segmenting the network is to preserve and in many cases improve scalability, resiliency, and security services available in a non-segmented network. Any technology used to achieve virtualization must also provide the necessary mechanisms to preserve resiliency and scalability, and to improve security.
A hierarchical IP network is a combination of Layer 3 (routed) and Layer 2 (switched) domains. Both types of domains must be virtualized and the virtual domains must be mapped to each other to keep traffic segmented. This can be achieved when combining the virtualization of the network devices (also referred to as "device virtualization") with the virtualization of their interconnections (known as "data path virtualization").
In traditional (that is, not virtualized) deployments, high availability and scalability are achieved through a hierarchical and modular design based on the use of three layers: access, distribution, and core.
Note
For more information on the recommended design choices to achieve high availability and scalability in campus networks, see the following URL: http://www.cisco.com/en/US/netsol/ns815/networking_solutions_program_home.html.
Much of the hierarchy and modularity discussed in the documents referenced above rely on the use of a routed core. Nevertheless, some areas of the network continue to benefit from the use of Layer 2 technologies such as VLANs (typically in a campus environment) and ATM or Frame Relay circuits (over the WAN). Thus, a hierarchical IP network is a combination of Layer 3 (routed) and Layer 2 (switched) domains. Both types of domains must be virtualized and the virtual domains must be mapped to each other to keep traffic segmented.
Virtualization in the Layer 2 domain is not a new concept: VLANs have been used for years. What is now required is a mechanism that allows the extension of the logical isolation over the routed portion of the network. Path isolation is the generic term referring to this logical virtualization of the transport. This can be achieved in various ways, as is discussed in great detail in the rest of this guide.
Virtualization of the transport must address the virtualization of the network devices as well as their interconnection. Thus, the virtualization of the transport involves the following two areas of focus:
•
Device virtualization—The virtualization of the network device; this includes all processes, databases, tables, and interfaces within the device.
•
Data path virtualization—The virtualization of the interconnection between devices. This can be a single-hop or multi-hop interconnection. For example, an Ethernet link between two switches provides a single-hop interconnection that can be virtualized by means of 802.1q VLAN tags; whereas for Frame Relay or ATM transports, separate virtual circuits can be used to provide data path virtualization. When an IP cloud is separating two virtualized devices, a multi-hop interconnection is required to provide end-to-end logical isolation. An example of this is the use of tunnel technologies (for example, GRE) established between the virtualized devices deployed at the edge of the network.
In addition, within each networking device there are two planes to virtualize:
•
Control plane—All the protocols, databases, and tables necessary to make forwarding decisions and maintain a functional network topology free of loops or unintended black holes. This plane can be said to draw a clear picture of the topology for the network device. A virtualized device must have a unique picture of each virtual network it handles; thus, there is the requirement to virtualize the control plane components.
•
Forwarding plane—All the processes and tables used to actually forward traffic. The forwarding plane builds forwarding tables based on the information provided by the control plane. Similar to the control plane, each virtual network has a unique forwarding table that needs to be virtualized.
Furthermore, the control and forwarding planes can be virtualized at different levels, which map directly to different layers of the OSI model. For instance, a device can be VLAN-aware and therefore be virtualized at Layer 2, yet have a single routing table, which means it is not virtualized at Layer 3. The various levels of virtualization are useful, depending on the technical requirements of the deployment. There are cases in which Layer 2 virtualization is enough, such as a wiring closet. In other cases, virtualization of other layers may be necessary; for example, providing virtual firewall services requires Layer 2, 3, and 4 virtualization, plus the ability to define independent services on each virtual firewall, which perhaps is Layer 7 virtualization.
Policy-Based Path Isolation
Policy-based path isolation techniques restrict the forwarding of traffic to specific destinations, based on a policy and independently of the information provided by the forwarding control plane. A classic example of this uses an ACL to restrict the valid destination addresses to subnets that are part of the same VPN.
Policy-based segmentation is limited by two main factors:
•
Policies must be configured pervasively (that is, at every edge device representing the first Layer 3 hop in the network)
•
Locally significant information (that is, IP address) is used for policy selection
The configuration of distributed policies can be a significant administrative burden, is error prone, and causes any update in the policy to have widespread impact.
Because of the diverse nature of IP addresses, and because policies must be configured pervasively, building policies based on IP addresses does not scale very well. Thus, IP-based policy-based segmentation has limited applicability.
As discussed subsequently in Deploying Path Isolation in Campus Networks, using policy-based path isolation with the tools available today (ACLs) is still feasible for the creation of virtual networks with many-to-one connectivity requirements, but it is very difficult to provide any-to-any connectivity with such technology For example, hub-and-spoke topologies are required to provide an answer to the guest access problem, where all the visitors need to have access to a single resource (the Internet). Using ACLs in this case is still manageable because the policies are identical everywhere in the network (that is, allow Internet access, deny all internal access). The policies are usually applied at the edge of the Layer 3 domain. Figure 3 shows ACL policies applied at the distribution layer to segment a campus network.
Figure 3 Policy-Based Path Isolation with Distributed ACLs
Control Plane-Based Path Isolation
Control plane-based path isolation techniques restrict the propagation of routing information so that only subnets that belong to a virtual network (VPN) are included in any VPN-specific routing tables and updates. To achieve control plane virtualization, a device must have many control/forwarding instances, one for each VPN. This is possible when using the virtual routing and forwarding (VRF) technology that allows for the virtualization of the Layer 3 devices.
Network Device Virtualization with VRF
A VRF instance consists of an IP routing table, a derived forwarding table, a set of interfaces that use the forwarding table, and a set of rules and routing protocols that determine what goes into the forwarding table. As shown in Figure 4, the use of VRF technology allows the customer to virtualize a network device from a Layer 3 standpoint, creating different "virtual routers" in the same physical device.
Note
A VRF is not strictly a virtual router because it does not have dedicated memory, processing, or I/O resources, but this analogy is helpful in the context of this guide.
Figure 4 Virtualization of a Layer 3 Network Device
Table 1 provides a listing of the VRF-lite support on the various Cisco Catalyst platforms that are typically found in an enterprise campus network. As is clarified in following sections, VRF-lite and MPLS support are different capabilities that can be used to provide separate path isolation mechanisms (VRF-lite + GRE, MPLS VPN, and so on.)
One important thing to consider with regard to the information above is that a Catalyst 6500 equipped with Supervisor 2 is capable of supporting VRFs only when using optical switching modules (OSMs). The OSM implementation is considered legacy and more applicable to a WAN environment. As a consequence, a solution based on VRF should be taken into consideration in a campus environment only if Catalyst 6500 platforms are equipped with Supervisors 32 or 720 (this is why this option is not displayed in Table 1).
The use of Cisco VRF-Lite technology has the following advantages:
•
Allows for true routing and forwarding separation—Dedicated data and control planes are defined to handle traffic belonging to groups with various requirements or policies. This represents an additional level of segregation and security, because no communication between devices belonging to different VRFs is allowed unless explicitly configured.
•
Simplifies the management and troubleshooting of the traffic belonging to the specific VRF, because separate forwarding tables are used to switch that traffic—These data structures are different from the one associated to the global routing table. This also guarantees that configuring the overlay network does not cause issues (such as routing loops) in the global table.
•
Enables the support for alternate default routes—The advantage of using a separate control and data plane is that it allows for defining a separate default route for each virtual network (VRF). This can be useful, for example, in providing guest access in a deployment when there is a requirement to use the default route in the global routing table just to create a black hole for unknown addresses to aid in detecting certain types of worm and network scanning attacks.
In this example, employee connectivity to the Internet is usually achieved by using a web proxy device, which can require a specific browser configuration on all the machines attempting to connect to the Internet or having the need to provide valid credentials. Although support for web proxy servers on employee desktops is common practice, it is not desirable to have to reconfigure a guest browser to point to the proxy servers. As a result, the customer can configure a separate forwarding table for using an alternative default route in the context of a VRF, to be used exclusively for a specific type of traffic, such as guest traffic. In this case, the default browser configuration can be used.
Data Path Virtualization—Single- and Multi-Hop Techniques
The VRF achieves the virtualization of the networking devices at Layer 3. When the devices are virtualized, the virtual instances in the various devices must be interconnected to form a VPN. Thus, a VPN is a group of interconnected VRFs. In theory, this interconnection can be achieved by using dedicated physical links for each VPN (a group of interconnected VRFs). In practice, this is very inefficient and costly. Thus, it is necessary to virtualize the data path between the VRFs to provide logical interconnectivity between the VRFs that participate in a VPN.
The type of data path virtualization varies depending on how far the VRFs are from each other. If the virtualized devices are directly connected to each other (single hop), link or circuit virtualization is necessary. If the virtualized devices are connected through multiple hops over an IP network, a tunneling mechanism is necessary. Figure 5 illustrates single-hop and multi-hop data path virtualization.
Figure 5 Single- and Multi-Hop Data Path Virtualization
The many technologies that virtualize the data path and interconnect VRFs are discussed in the next sections. The various technologies have benefits and limitations depending on the type of connectivity and services required. For instance, some technologies are very good at providing hub-and-spoke connectivity, while others provide any-to-any connectivity. The support for encryption, multicast, and other services also determine the choice of technologies to be used for the virtualization of the transport.
The VRFs must also be mapped to the appropriate VLANs at the edge of the network. This mapping provides continuous virtualization across the Layer 2 and Layer 3 portions of the network. The mapping of VLANs to VRFs is as simple as placing the corresponding VLAN interface at the distribution switch into the appropriate VRF. The same type of mapping mechanism applies to Layer 2 virtual circuits (ATM, Frame Relay) or IP tunnels that are handled by the router as a logical interface. The mapping of VLAN logical interfaces (Switch Virtual Interface [SVI]) and of sub-interfaces to VRFs is shown in Figure 6.
Figure 6 VLAN to VRF Mapping
Path Isolation Initial Design Considerations
Before discussing the various path isolation alternatives in more detail, it is important to highlight some initial considerations that affect the overall design presented in the rest of this guide. These assumptions are influenced by several factors, including the current status of the technology and the specific business requirements driving each specific solution. As such, they may change or evolve in the future; this guide will be accordingly updated to reflect this fact.
•
Use of virtual networks for specific applications
The first basic assumption is that even in a virtualized network environment, the global table is where most of the enterprise traffic is still handled. This means that logical partitions (virtual networks) are created to provide response to specific business problems (as, for example, guest Internet access), and users/entities are removed from the global table and assigned to these partitions only when meeting specific requirements (as, for example, being a guest and not an internal enterprise employee). The routing protocol traditionally used to provide connectivity to the various enterprise entities in global table (IGP) is still used for that purpose. In addition, the global IGP may also be used to provide the basic IP connectivity allowing for the creation of the logical overlay partitions; this is, for example, the case when implementing tunneling technologies such as VRF-Lite and GRE or MPLS VPN. In summary, the idea is to maintain the original global table design and "pull out" entities from the global table only for satisfying specific requirements (the business drivers previously discussed). This strategy allows support for gradual evolution to a virtualized from a non-virtualized network; also, it reduces the risk to existing production applications.
•
Integration of VoIP technologies in a virtualized network
When deploying a VoIP architecture to be integrated in a virtualized network, the first release of this document recommended as best practice to keep the main components of the voice infrastructure (VoIP handsets, Cisco CallManagers, Cisco Unity Servers, and so on) in the global table, together with all the users that use voice services (using Cisco Communicator software, VT Advantage, and so on). Reasons for following this recommendation in this phase of the technology include the following:
–
Lack of VRF-aware voice services such as Survivable Remote Site Telephony (SRST) or Resource Reservation Protocol (RSVP) for Call Admission Control (CAC), which would prevent a successful deployment of VoIP technologies at remote locations (without the burden of replicating the physical network infrastructure, which is against one of the main drivers for virtualizing the network).
- Complex configuration required at the services edge of the network to allow the establishment of voice flows between entities belonging to separate VPNs. This was also requiring "punching" holes in the firewall deployed in this area of the network, increasing the security concerns of the overall solution.
- VoIP can always be secured without requiring the creation of a dedicated logical partition for the voice infrastructure. There are proven tools and design recommendations that can be used for hardening the voice systems that are inherent in the system and do not require any form of network virtualization to be implemented. For more information, see the design guides at the following URL: http://www.cisco.com/en/US/netsol/ns818/networking_solutions_program_home.html.
When the VoIP infrastructure is deployed in the global table, the direct consequence is the recommendation of keeping all the internal users that make use of VoIP applications (such as Cisco Communicator clients, for example) in the same domain, to not complicate the design too much when there is a need to establish voice flows between these users and, for example, the VoIP handsets.
In the current phase of the Network Virtualization project, this tight recommendations is relaxed; focusing on a specific campus deployment model, we have validated and documented how to start integrating Unified Communication application into a virtualized network environment. The proposed design is discussed in greater detail as part of the Services Edge Design Guide at:
http://www.cisco.com/en/US/docs/solutions/Enterprise/Network_Virtualization/ServEdge.html
In future phases of this project, this concept will be also extended to branch locations, in order to be able to provide and end-to-end integration story.
•
Deployment of network virtualization as an overlay design
Another important initial assumption is that the deployment of a virtualized infrastructure constitutes an overlay design rather than a "rip-and-replace" approach. This means that the goal is the deployment of network virtualization without impacting (or just with limited impact to) network design that customers may already have in place. For example, if routing is already deployed using a specific IGP, the design should focus on demonstrating how to add services to that specific environment, rather than suggesting to tear apart the network and put a new network in place. This guide is focused on networks characterized by a single autonomous system (AS) and a single IGP-based environment, rather than large backbones with dual-redundant BGP cores.
•
Security and VRF considerations
Consider the following with regard to security and VRF:
–
A VRF-enabled network device is different from a completely virtualized device. The latter is usually referred to as "logical router", whereas the first is called "virtual router". A VRF-enabled device shares device resources (such as CPU, memory, hardware, and so on) between the various virtual instances supported. This essentially means that a failure of a problem with one of these shared elements affects all the virtual routers defined in the box.
–
In terms of isolation versus privacy, configuring separate VRFs allows support for multiple address spaces and for virtualizing both the control and data planes. However, simply doing this does not ensure the privacy of the information that is exchanged in the context of each VPN. To provide this extra layer of security, other technologies (such as IPsec) should be coupled with the specific path isolation strategy implemented.
–
The use of VRF does not eliminate the need for edge security features. As previously discussed, VRFs are enabled on the first Layer 3 hop device; therefore, many of the security features that are recommended at the edge of the network (access layer) should still be implemented. This is true for identity-based techniques, such as 802.1x and MAB, which are discussed in Network Virtualization—Access Control Design Guide (OL-13634-01).
However, it is important to highlight the requirement for integrating other security components, such as Catalyst Integrated Security Features (CISF) including DHCP Snooping, IP Source Guard, Dynamic ARP Inspection, or Port Security. In addition to these, Control Plane Policing (CPP) also needs to be considered to protect the CPU of the network devices. Another factor is that, as explained in the previous point above, a problem in a specific VRF may affect the CPU of the virtualized devices causing outages also in the other VRFs defined in the network device.
•
QoS and network virtualization
QoS and network virtualization are orthogonal problems in this phase of the technology. The main reason is that the DiffServ architecture has been deployed to be oriented around applications. Traffic originated by different applications (such as voice and video) is classified and marked at the edge of the network, and this marking information is used across the network to provide it with an appropriate level of service.
In this phase of the technology, most enterprise routers and switches lack a virtual QoS mechanism. This means, for example, that the various input and output queues available on the network devices are not VRF-aware, which essentially implies that there is no capability to treat differently traffic originated by the same type of application in two different VPNs. For this reason, when discussing the deployment of QoS technologies in a virtualized network, there are two main strategies that can be adopted and that are applied to the various path isolation alternatives discussed in this paper:
–
Conform with the DiffServ standard functionality and keep classifying the traffic at the edge on an application base. This means that flows originating from the same application in different VPNs are treated in the same way across the network.
–
Define per-VPN policies. This means that all the traffic originating in a specific VPN is classified in the same way, independently from the application that originated it. This may find applicability for example in guest access scenarios, where the recommended strategy is to classify all the traffic originated from the guest user as best effort when below a predefined threshold. Traffic exceeding the threshold could for example be classified as scavenger so that it is the first to be dropped in case of network congestion.
The following sections provide more details on various path isolation techniques. The first is the use of distributed ACLs that, as previously mentioned, can be considered a policy-based mechanism, and is here discussed as a "legacy" way of limiting communication between users belonging to different network partitions. Various control plan-based techniques are then analyzed: first the use of VRF-Lite in conjunction with GRE tunneling, specifically recommended for deployments where an hub-and-spoke type of connectivity must be provided. For scenarios requiring any-to-any connectivity, the use of MPLS VPNs is discussed, highlighting the main differences between the enterprise deployments versus the more traditional service provider deployment.
Deploying Path Isolation in Campus Networks
The first part of this document focuses on discussing various techniques to provide path isolation in an enterprise campus network. In the last part of this document, we see how some of these techniques could be reused across MAN/WAN clouds.
As a first step, a legacy technique is reviewed, based on the use of distributed ACLs. After that, several control plane techniques are discussed, all based on the use of Virtual Routing and Forwarding (VRF) functionality.
Path Isolation Using Distributed Access Control Lists
The use of distributed ACLs represents a classic example of a policy-based path isolation mechanism to restrict the forwarding of traffic to specific destinations, based on a policy and independently of the information provided by the control plane. This allows restricting the group of valid destination addresses to the subnets that are configured as part of the same VPN (or virtual network).
Connectivity Requirements
The use of static ACLs at the edge of the network is the quickest way to provide traffic isolation, controlling and restricting communications between the various user groups. Most customers are comfortable with the use of ACLs to enforce security policies.
At the same time, using ACLs is recommended only in very specific scenarios where the network connectivity requirements are hub-and-spoke (multi-to-one). The main limitation of the ACL approach is the lack of scalability. The complexity of each distributed ACL is directly related to two main factors:
•
The number of user groups that need to be supported
•
Connectivity requirements between user groups
Defining ACLs in scenarios with a large number of groups requiring any-to-any connectivity can quickly become cumbersome from a management point of view. The goal is to propose this approach when the connectivity requirement is hub-and-spoke, so that it is possible to create a portable ACL template to be used across different spoke devices. Two typical applications that require this type of connectivity are guest access (where the target is providing access to the Internet as a centralized resource), and Network Admission Control (NAC) remediation (where connectivity must be restricted between unhealthy endpoints and a centralized remediation server). The common characteristic for these applications is the very limited number of user groups required (two in both cases), which makes the ACL approach a feasible technical candidate.
Configuration Details
The main goal is to create a generic ACL template that can be seamlessly used on all the required edge devices. This approach minimizes configuration and management efforts, and enhances the scalability of the overall solution. The same generic ACL should also be applied for both wired and wireless deployments. The specific wireless solution in place should affect the network device where the policy is applied, but not the format of the ACL itself.
Using ACLs to logically isolate traffic for specific categories of users (for example, employees and guests) on the same physical network implies that the control and data plan of the network needs to be shared between these different groups. The most immediate consequence is a limited freedom in assigning IP addresses to the various categories of users. The root of this problem is shown in Figure 7, which represents a generic campus network. This example refers to a guest access deployment where the hub devices are located in the Internet edge, but it can also be generic.
Figure 7 IP Addressing in the Campus Network
As shown in Figure 7, the recommended campus design dictates the assignment of IP addresses to various campus buildings in such a way that a summary route can be sent to the core (independent of the specific routing protocol being used). This isolates the buildings from a routing control point of view, contributing to the overall scalability and stability of the design. For example, 10.121.0.0/16 is the summary sent toward the core by the distribution layer devices belonging to Building 1.
Note
The IP addresses used in this example simplify the description and are not intended to represent a best practice summarization schema.
As a result, all the IP subnets defined in each specific building block should be part of the advertised summary. This implies that subnets associated to the same user group but defined in separate buildings are part of different class B subnets. This clearly poses a challenge in defining a generic ACL template to be applied to devices belonging to different campus building blocks. The best way to achieve this is to define the edge policies without including the subnets from which the traffic is originated.
The recommended design described in this guide is based on the use of router ACLs (RACLs), which must be applied to Layer 3 interfaces. This means that in the multilayer campus design, the RACLs are applied to the distribution layer devices (representing the demarcation between Layer 2 and Layer 3 domains). The format of these ACLs remains the same, even in campus routed access deployments where the demarcation between Layer 2 and Layer 3 is pushed down to the access layer. The only difference is that, in this case, the RACLs need to be applied on the switched virtual interface (SVI) defined on the access layer devices.
RACLs are supported in hardware on Cisco Catalyst 6500 and 4500 platforms, which represent the devices most commonly deployed in the distribution layer of each campus building block. The simplest RACL that can be deployed for a generic hub-and-spoke scenario is as follows:
ip access-list extended SEGM-RACL10 permit udp any any eq bootps20 permit udp any host <DNS-Server-IP> eq domain30 deny ip any <protected_prefixes>40 permit ip any <target_prefixes>•
Statements 10 and 20 allow connectivity to receive DHCP and DNS services (if needed).
•
Statement 30 denies connectivity to protected resources that should not being accessed from this specific category of users.
•
Statement 40 restricts connectivity only to the subset of required prefixes. The list of required prefixes varies, depending on the specific application. For example, in the case of guest access, it might be all the public IP addresses representing the Internet; for NAC remediation, it might be represented by the remediation server.
Note
As previously mentioned, this ACL is generic enough to be applied to various edge devices. The key to doing this is to avoid the use of the source IP address in ACL statements.
RACLs derive their name from the fact that they need to be applied on Layer 3 (routed) interfaces. The Layer 3 interface where the RACL is applied depends on the specific type of network access used. For wired clients, the Layer 3 interfaces are the SVI (VLAN interface) defined on the distribution layer device (traditional design) or on the access layer devices (routed access design). The configuration for a generic SVI is as follows:
interface Vlan50description Wired-client-floor1ip address 10.124.50.2 255.255.255.0ip access-group SEGM-RACL inFor wireless clients, it depends on the specific deployment in place. For traditional Cisco Aironet deployments and deployments using WLAN controllers, the situation is very similar to the wired case, and the ACL is applied on the SVIs defined on the distribution or access layer devices. For WLSM designs, where all the data traffic is tunneled from each distributed access point to a centralized Catalyst 6500 equipped with WLSM, the RACL can be directly applied on the receiving multipoint GRE (mGRE) interfaces defined on this centralized device, as follows:
interface Tunnel160description mGRE for clients-floor1ip address 10.121.160.1 255.255.255.0ip access-group SEGM-RACL inPath Differentiation
Another aspect to consider is the problem of path differentiation. In some scenarios, you might need to redirect the traffic to a specific direction when it gets to the hub device. For example, this can be relevant in a guest access scenario where traffic might need to be enforced through a web authentication appliance. The solution uses policy-based routing (PBR). The following configuration samples and considerations refer to a guest access application, but their validity can easily be extended to other applications. Without going into specific detail on the problems associated with web authentication, note that web authentication appliances are usually deployed in-band, so you must devise a way to enforce the guest traffic through them, as illustrated in Figure 8.
Figure 8 Traffic Flows for Various Categories of Users
An internal employee and a guest pointing to the same final destination (in this example, www.google.com) must take two different paths. The employee can connect directly to the Internet after going through a firewall (or a firewall context, as shown in Figure 8). The guest must first be forced through the web authentication appliance to complete an authentication process. The recommended way to accomplish this is by using PBR on the network devices in the Internet edge, connecting to the campus core (two Catalyst 6500s in this example).
²
Note
On Catalyst 6500 platforms using Supervisor 2 with PFC2 or Supervisor 720 with PFC3, PBR is fully supported in hardware using a combination of security and the ACL ternary content addressable memory (TCAM) feature, and the hardware adjacency table. Although a detailed description of PBR is beyond the scope of this guide, note that PBR does consume ACL TCAM resources and adjacency table entries. In Supervisor 2 with PFC2, 1024 of the 256 K available hardware adjacencies are reserved for PBR. In Supervisor 720 with PFC3, 2048 of the one million available hardware adjacencies are reserved for PBR.
The considerations about the IP range assignment to the guest subnets made in the previous section also have an impact on the configuration of the ACL to be used for policing the traffic in the Internet edge. It is unlikely that you can summarize all the guest subnets in a limited number of statements. More likely, a separate ACL statement needs to be added for each specific guest subnet defined in each campus building block, as shown in the following configuration sample:
ip access-list extended TO-WEB-AUTH-DEVICEpermit ip 10.121.150.0 0.0.0.255 anypermit ip 10.121.160.0 0.0.0.255 anypermit ip 10.122.150.0 0.0.0.255 any............................................................................................................permit ip 10.128.160.0 0.0.0.255 any!route-map guest-to-WEB-AUTH-DEVICE permit 10match ip address TO-WEB-AUTH-DEVICEset ip next-hop 172.18.3.30
Note
The address specified in the set ip next-hop statement is the internal interface of the web authentication appliance.
The route map must then be applied on all the physical interfaces connecting the Internet edge devices to the core of the network, as follows:
interface TenGigabitEthernet3/1description 10GigE link to Core Switch 1ip address 10.122.0.7 255.255.255.254ip policy route-map guest-to-WEB-AUTH-DEVICEHigh Availability Considerations
The resiliency of a solution based on the use of distributed ACLs is achieved by implementing the recommended campus design. More information on this subject is beyond the scope of this guide. For more information, see the campus HA documents at the following URL: http://www.cisco.com/en/US/netsol/ns815/networking_solutions_program_home.html.
Challenges and Limitations of Distributed ACLs
Some of the challenges and limitations of the distributed ACL approach are as follows:
•
ACLs do not support full data and control plane separation. Traffic originating from edge subnets that is associated to different user groups is sent to the core of the network and is handled in the common global routing table. This scenario is prone to configuration errors, which can cause the establishment of unwanted communications between different groups. Also, in cases where path differentiation must be achieved, using a common routing table forces the use of more complex configuration (such as the PBR described in Path Differentiation).
•
In many cases, the configuration is simplified by assigning a dedicated (and possibly overlapping) IP address space to the subnets associated to different user groups. As previously described, this is usually not possible in a campus deployment because of route summarization requirements and because of the use of a shared global routing table.
•
Depending on the IP addressing plan being used, the distributed ACL can become lengthy and require many statements to deny connectivity to the enterprise internal resources.
You can eliminate all the previously described limitations associated with using distributed ACLs if you can separate the data and control plans for each separate category of users. The following section describes a different network virtualization approach aimed at achieving this through the use of the Cisco VPN Routing and Forwarding (VRF) technology.
Path Isolation Leveraging Control Plan Techniques
The previous approach based on the use of ACLs was presented for the sake of completeness and because it represents a legacy way of deploying security policies. The bulk of this document discusses several control plan techniques that can be leveraged to provide the desired path isolation functionality. As previously mentioned, all these techniques are based on the use of VRF functionality.
Three main technical alternatives are discussed in detail in the following sections:
•
VRF-Lite and GRE tunnels
•
MPLS VPN (RFC 2547)
•
VRF-Lite End-to-End (or Hop-by-Hop)
An end-to-end campus path isolation strategy can be divided in two separate and subsequent steps:
1.
Virtualizing each campus distribution block—This includes the virtualization (both at Layer 2 and Layer 3) of the network devices and of the network services commonly deployed in the access and distribution layers of the campus.
2.
Virtualizing the core of the network in order to glue together the various campus distribution blocks—As it is clarified in the following sections, this can be done by leveraging tunneling technologies (which technically allows to interconnect the virtualized blocks without introducing any VRF definition in the core) or hop-by-hop IP-based technologies.
Virtualizing the Campus Distribution Block
The term campus distribution block usually refers to the set of closet (access layer) switches aggregated by the same pair of distribution layer devices, as shown in Figure 9.
Figure 9 Campus Distribution Block
The virtualization of the campus distribution block can be completed by completing two major steps:
•
Virtualization of the network devices—This is done at Layer 2 (by provisioning VLANs) and at Layer 3 (by creating VRFs and their corresponding VLAN mapping).
•
Virtualization of the network services—This requires the replication of network services typically enabled in a campus distribution block (like FHRP, STP, DHCP Relay, etc.).
Different considerations can be made depending on the specific campus model implemented; the following sections discuss in detail the multi-tier (Layer 2 in the access, Layer 3 in distribution) and routed access (Layer 3 in the access) designs.
Multi-Tier Campus Design
In the traditional multilayer campus design, the access layer is deployed with Layer 2 capabilities only and the distribution layer devices represent the boundary between Layer 2 and Layer 3 domains in the network. The generic campus distribution block is shown in Figure 10.
Figure 10 Generic Campus Distribution Block
More details on the recommended configuration and deployment guidelines for the traditional distribution block design can be found in the campus design guides previously referenced. What is important to highlight here is the operational impact of virtualizing the network.
The first thing to consider is that virtualization at Layer 2 is nothing new and is still achieved by using VLANs. As a consequence, the network virtualization requirement of supporting different logical groups in the same campus network drives the definition of an additional number of VLANs in each access layer device; these VLANs are then carried upstream toward the distribution layer via Layer 2 trunk connections (see Figure 11).
Figure 11 VLAN Definition
In addition to the previous existing data and voice VLANs, new VLANs (at least one per each new virtual network) are now required to provide differentiated access to separate network entities (users and/or devices). The following additional considerations are required:
•
The deployment of various network entities into their corresponding segments (VLANs) can be achieved through static configuration (each edge port is manually assigned to a specific VLAN) or via dynamic mechanism such as 802.1X or NAC. This is discussed more extensively in the Network Virtualization-Access Control 2.0 Design Guide (OL-13634-01).
•
Following the best practice design to keep VLAN numbers unique per access layer switch (as shown in Figure 11) may require the creation of a high number of new VLANs (and corresponding SVIs) on the distribution layer devices. This needs to be taken into consideration especially for very large distribution layer blocks (high number of access layer switches connecting to the same distribution layer pair), because it creates the following two main operational challenges:
–
Need for planning for new VLANs and corresponding IP subnet allocation.
In very large deployments, it may be required to extend the range of VLANs that can be defined on a Catalyst 6500 platform. By default, the upper limit is 1001, but it can be extended to 4094 by using the following command:
cr20-6500-1(config)#spanning-tree extend system-id–
Increase of the control plane load for protocols such as Spanning Tree, HSRP, etc.
Note
The total number of new VLANs/IP subnets that need to be provisioned is the product of the total number of closet switches belonging to the same campus distribution block and the number of VRFs deployed.
The logical isolation provided by VLANs ceases to exist at the boundary between Layer 2 and Layer 3 domains (the distribution layer devices); it is thus required to define VRFs on these devices and map each VLAN to its own dedicated VRF instance, as shown in Figure 12.
Figure 12 VRF Definition
This is because the virtualized network consists of the combination of the Layer 2 VLAN and the Layer 3 VRF, so a mapping between these components is required to achieve logical isolation end-to-end across the network. It is worth noting that, independently from the number of VLANs defined in the campus distribution block, the number of VRFs is directly dictated by the number of logical groups that need to be supported. For example, all the VLANs defined on each access layer device for "Red" users are mapped to the same "Red" VRF defined at the distribution layer.
After VLANs have been mapped to the corresponding VRFs, there is the need to connect together the VRFs defined in different campus distribution blocks. As previously mentioned, there are multiple technical alternatives to provide this functionality and these are discussed in detail later in this document. Independently from the specific technology of choice, it is worth noticing that the corresponding configuration is applied on the distribution layer devices in a multi-tier design, since these devices represent the first Layer 3 hop in the network.
From a configuration standpoint, a different set of steps are required on the access later switch versus the distribution layer devices, as pointed out below.
Access Layer Switch Virtualization
The configuration steps that allow the virtualization of the access layer devices are (refer to the network in Figure 11):
•
Creation of the Layer 2 VLAN entities
vlan 21name Red_VPN!vlan 22name Green_VPN!vlan 23name Blue_VPN•
Assignment of edge (user-facing) interfaces to the newly defined VLANs. It is worth noticing that this step is required only when leveraging static VLAN assignment at the edge of the network. If a dynamic mechanism is implemented (802.1X or NAC), this step can be avoided.
interface GigabitEthernet2/1description Red Userswitchportswitchport mode accessswitchport access vlan 21spanning-tree portfastspanning-tree bpduguard enable!interface GigabitEthernet2/2description Green Userswitchportswitchport mode accessswitchport access vlan 22spanning-tree portfastspanning-tree bpduguard enable!interface GigabitEthernet2/3description Blue Userswitchportswitchport mode accessswitchport access vlan 23spanning-tree portfastspanning-tree bpduguard enable•
Adding the newly defined VLANs to the trunk uplinks connecting to the distribution layer switches.
interface GigabitEthernet5/1description L2 Trunk to Distrib. 1switchportswitchport trunk encapsulation dot1qswitchport trunk native vlan 512switchport trunk allowed vlan add 21-23switchport mode trunkswitchport nonegotiate!interface GigabitEthernet6/1description L2 Trunk to Distrib. 2switchportswitchport trunk encapsulation dot1qswitchport trunk native vlan 512switchport trunk allowed vlan add 21-23switchport mode trunkswitchport nonegotiateDistribution Layer Switch Virtualization
The configuration steps required to virtualize the distribution layer devices are the following (refer to Figure 12):
•
Definition of the VRFs
ip vrf Redrd 1:1!ip vrf Greenrd 1:1!ip vrf Bluerd 1:1
Note
The latest releases of IOS code do not require anymore specifying the Route-Distinguisher (RD) parameter in order to activate a VRF. The configuration above still makes use of the RDs for backward compatibility. Also, the use of route-target is relevant only in conjunction with the use of MP-BGP as control protocol and it is discussed in more detail when describing the MPLS VPN path isolation option.
•
Definition of the Layer 2 VLAN entities (this step is identical to that performed on the access layer switches).
vlan 21name Red_VPN_access_switch_1!vlan 22name Green_VPN_access_switch_1!vlan 23name Blue_VPN_access_switch_1•
Definition of the Layer 3 VLAN Interfaces (SVIs) and mapping to the proper VRF. Notice how only Layer 3 interfaces (physical or logical) can be mapped to a VRF.
interface Vlan21description Red_VPN_access_switch_1ip vrf forwarding Redip address 10.137.21.3 255.255.255.0standby 1 ip 10.137.21.1standby 1 timers msec 250 msec 750standby 1 priority 105standby 1 preempt delay minimum 180!interface Vlan22description Green_VPN_access_switch_1ip vrf forwarding Greenip address 10.137.22.3 255.255.255.0standby 1 ip 10.137.22.1standby 1 timers msec 250 msec 750standby 1 priority 105standby 1 preempt delay minimum 180!interface Vlan23description Red_VPN_access_switch_1ip vrf forwarding Redip address 10.137.23.3 255.255.255.0standby 1 ip 10.137.23.1standby 1 timers msec 250 msec 750standby 1 priority 105standby 1 preempt delay minimum 180
Note
The configuration above is valid for the HSRP active device. A similar configuration would apply to the peer distribution switch (HSRP standby), as discussed in detail in the previously referenced campus design guides.
Routed Access Campus Design
In the routed access campus design, the demarcation line between Layer 2 and Layer 3 domains is moved from the distribution layer down to the access. As a consequence, all the closet switches start performing routing functionalities, as shown in Figure 13.
Figure 13 Routed Access Design
More details on the recommended configuration and deployment guidelines for routed access design can be found in the campus design:
http://www.cisco.com/en/US/docs/solutions/Enterprise/Campus/routed-ex.html
A first immediate consequence of this design is the fact that Layer 2 VLANs are confined at the access layer only and the uplinks connecting to the distribution layer switches now become routed (Layer 3) links, as shown in Figure 14.
Figure 14 VLAN Definition in Routed Access Model
The following additional considerations are required:
•
The deployment of various network entities into their corresponding segments (VLANs) can be achieved through static configuration (each edge port is manually assigned to a specific VLAN) or via dynamic mechanism such as 802.1X or NAC. This is discussed more extensively in the Network Virtualization-Access Control 2.0 Design Guide (OL-13634-01).
•
The total number of new IP subnets that need to be provisioned is identical to the previously discussed multi-tier deployment. However, the VLAN number in this case can be re-used on separate access layer switches given the fact that they behave as Layer 3 devices.
•
The control plane load in this case is minimal, since neither FHRP nor STP is required on the distribution layer devices. It is best practice to still keep STP running for the VLANs defined in the closet switch as a safety belt mechanism for configuration or cabling errors.
The VRF definition is required on the first Layer 3 hop in each campus distribution block, which means in a routed access scenario needs to be performed on every closet switch, as highlighted in Figure 15.
Figure 15 VRF Definition in Routed Access Model
The definition of VRFs in the access layer of the network brings up an interesting point: what about the distribution layer switches? Would they require a VRF configuration as well? The answer, as always, is it depends. When a tunneling technology (GRE tunnels or MPLS VPN) is deployed as a path isolation strategy starting from the access layer device, the distribution (and core) switches do not require any VRF definition.
If an IP-based hop-by-hop technique is instead chosen (or the tunneling mechanism is started at the distribution layer, usually because of platform specific limitation on the closet devices), the VRF configuration would also be required on the distribution layer devices.
From a configuration standpoint, a different set of steps are required on the access later switch versus the distribution layer devices, as pointed out below.
Access Layer Switch Virtualization
The configuration steps that allow the virtualization of the access layer devices are the following (refer to Figure 15):
•
Creation of the Layer 2 VLAN entities
vlan 21name Red_VPN!vlan 22name Green_VPN!vlan 23name Blue_VPN•
Assignment of edge (user-facing) interfaces to the newly defined VLANs. It is worth noticing that this step is required only when leveraging static VLAN assignment at the edge of the network. If a dynamic mechanism is implemented (802.1X or NAC), this step can be avoided.
interface GigabitEthernet2/1description Red Userswitchportswitchport mode accessswitchport access vlan 21spanning-tree portfastspanning-tree bpduguard enable!interface GigabitEthernet2/2description Green Userswitchportswitchport mode accessswitchport access vlan 22spanning-tree portfastspanning-tree bpduguard enable!interface GigabitEthernet2/3description Blue Userswitchportswitchport mode accessswitchport access vlan 23spanning-tree portfastspanning-tree bpduguard enable•
Definition of the VRFs
ip vrf Redrd 1:1!ip vrf Greenrd 1:1!ip vrf Bluerd 1:1
Note
The latest releases of IOS code do not require anymore specifying the Route-Distinguisher (RD) parameter in order to activate a VRF. The configuration above still makes use of the RDs for backward compatibility. Also, the use of route-target is relevant only in conjunction with the use of MP-BGP as control protocol and it is discussed in more detail when describing the MPLS VPN path isolation option.
•
Definition of the Layer 3 VLAN Interfaces (SVIs) and mapping to the proper VRF. Notice how only Layer 3 interfaces (physical or logical) can be mapped to a VRF.
interface Vlan21description Red_VPN_access_switch_1ip vrf forwarding Redip address 10.137.21.1 255.255.255.0!interface Vlan22description Green_VPN_access_switch_1ip vrf forwarding Greenip address 10.137.22.3 255.255.255.0!interface Vlan23description Red_VPN_access_switch_1ip vrf forwarding Redip address 10.137.23.3 255.255.255.0
Note
No FHRP protocols (HSRP, VRRP, GLBP) are required in this case, since each access layer switch performs the functionality of default gateway for all the devices connected to it.
Virtualization of Network Services
The virtualization of the network devices belonging to each campus distribution block through provisioning of VLANs and VRFs is followed by the requirement to virtualize the network services that are traditionally enabled in this area of the campus network. Some of the services are directly related to Layer 3 functionalities, some to Layer 2 functionalities; as such, the specific layer (access or distribution) where they are enabled depends on the specific campus model adopted (multi-tier or routed access).
A typical example of Layer 2 functionality is the Spanning Tree Protocol; it has already been pointed out how enabling network virtualization may result in the growth of VLANs defined in the distribution block devices. This may impact the spanning tree design, because for example there would be more instances of the protocol running (usually one per each VLAN). However, there is no requirement for adding any functionality to spanning tree, because it still works at Layer 2 the same way it has always done.
A different case is when analyzing the Layer 3 functionalities enabled on the first Layer 3 hop in the network. Defining VRFs in fact allows you to virtualize the network device at Layer 3, but this implies that all the Layer 3 network services need to be somehow virtualized as well (or made VRF-aware). Therefore, it is important to highlight what functionalities are available today on Catalyst 6500 platforms, pointing out also the new ones that may become available in future IOS releases of code.
Note
The following list is not exhaustive, but highlights only the specific services that are discussed in the campus design guides previously referenced. Be sure to verify with the release note the VRF support for additional features that may be required in specific design cases.
First Hop Redundancy Protocol
The use of a First Hop Redundancy Protocol (FHRP) is required only in multi-tier campus designs, where a pair of distribution layer devices represent the first Layer 3 hop in the network functioning as the default gateway for all the clients deployed in the IP subnets belonging to the specific distribution block. Traditionally, a FHRP is deployed to allow the distribution layer pair of devices to function as a single virtual device from the default gateway functionality point of view. Three protocols can usually be implemented for this:
•
Hot Standby Routing Protocol (HSRP)
•
Gateway Load Balancing Protocol (GLBP)
•
Virtual Router Redundancy Protocol (VRRP)
FHRP protocols perform their functionality adding Address Resolution Protocol (ARP) entries and IP hash table entries (aliases); this by default is done using the default routing table instance. However, because a different routing table instance is used when VRF forwarding is configured on an interface, ARP and Internet Control Message Protocol (ICMP) echo requests for the FHRP virtual IP address fail, unless the protocol is made VRF-aware, thus capable of using the information in the VRF-specific routing table.
The example below exemplifies this functionality for HSRP:
•
Distribution switch 1 (HSRP Active)
interface Vlan12description Users in VPN v1ip vrf forwarding v1ip address 10.137.12.3 255.255.255.0standby 1 ip 10.137.12.1standby 1 timers msec 250 msec 750standby 1 priority 105standby 1 preempt delay minimum 180•
Distribution switch 2 (HSRP Standby)
interface Vlan12description Users in VPN v1ip vrf forwarding v1ip address 10.137.12.2 255.255.255.0standby 1 ip 10.137.12.1standby 1 timers msec 250 msec 750As noticed above, the configuration is essentially identical to the traditional one required on Layer 3 interfaces belonging to global table (the default VRF). However, the VRF awareness capability allows, for example, to have two separate Layer 3 VLAN interfaces with overlapping IP addresses and mapped to different VRFs (for example Red and Green). Without VRF awareness, HSRP would get confused, whereas the capability allows the protocol to maintain a separate state for the two set of interfaces, as follows:
cr20-6500-1#sh standby vlan 2Vlan2 - Group 1Local state is Active, priority 105, may preemptPreemption delayed for at least 180 secsHellotime 250 msec, holdtime 750 msecNext hello sent in 0.033Virtual IP address is 10.137.12.1 configuredActive router is localStandby router is 10.137.12.2 expires in 0.510Virtual mac address is 0000.0c07.ac012 state changes, last state change 00:02:37IP redundancy name is "hsrp-Vl2-1" (default)cr20-6500-1#sh standby vlan 12Vlan12 - Group 1Local state is Active, priority 105, may preemptPreemption delayed for at least 180 secsHellotime 250 msec, holdtime 750 msecNext hello sent in 0.218Virtual IP address is 10.137.12.1 configuredActive router is localStandby router is 10.137.12.2 expires in 0.530Virtual mac address is 0000.0c07.ac0111 state changes, last state change 2d00hIP redundancy name is "hsrp-Vl12-1" (default)One additional consideration is required for HSRP tracking; deploying HSRP tracking is usually not required or recommended in a fully redundant campus topology. However, there are some designs where it is deployed, specifically when the distribution block is not connected to the core in a fully meshed fashion, as shown in Figure 16.
Figure 16 Deploy HSRP Tracking
In this case, usually the HSRP tracking is configured so that if the interface connecting to the core fails (Ten1/1), the HSRP standby becomes active, avoiding the use of the transit link between the distribution peers for all the upstream traffic.
•
Distribution switch 1 (HSRP Active)
interface Vlan12description Users in VPN v1ip vrf forwarding v1ip address 10.137.12.3 255.255.255.0standby 1 ip 10.137.12.1standby 1 timers msec 250 msec 750standby 1 priority 105standby 1 preempt delay minimum 180standby 1 authentication esestandby 1 track TenGigabitEthernet1/1•
Distribution switch 2 (HSRP Standby)
interface Vlan12description Users in VPN v1ip vrf forwarding v1ip address 10.137.12.2 255.255.255.0standby 1 ip 10.137.12.1standby 1 timers msec 250 msec 750standby 1 authentication esestandby 1 preemptThe physical interface connecting to the core (Ten1/1 in our example) usually belongs to global table; however, configuring HSRP tracking for that specific interface also for a SVIs mapped to a VRF (as shown above), allows triggering the failover also for that specific VPN subnet. The recommendation is thus to use tracking on all the SVIs defined in the distribution block (belonging to global table and to each defined VRF).
The first two FHRP protocols listed above were developed by Cisco, whereas VRRP is the IETF standard based of HSRP (RFC 3768). The support of these protocols in the context of a VRF is platform and software release dependant; the list below highlights the level of support for Catalyst platforms.
•
Catalyst 6500
–
HSRP is supported on Layer 3 interfaces that are mapped to a specific VRF from release 12.2(17d)SXB.
–
VRRP is VRF aware starting from release 12.2(18)SXF
–
GLBP VRF awareness was delivered in 12.2(33)SXH
•
Catalyst 4500
–
HSRP and VRRP VRF awareness is supported in 12.2(50)SG release
–
There is no current support for GLBP
•
Catalyst 3750
–
HSRP in a VRF is supported from 12.2(40)SE release
–
There is currently no support for VRF aware VRRP or GLBP
DHCP Relay
Distribution layer devices provide DHCP relay support for the endpoints connected to the access switches. Because the DHCP infrastructure is usually deployed in a centralized location in the network (for example, in a data center), this means that the first Layer 3 hop devices need to be able to relay the initial broadcast DHCP request received from the client to the remotely located DHCP server. This is supported via the ip helper-address command, as shown in the following configuration sample:
interface Vlan12description Users in VPN v1ip vrf forwarding v1ip address 10.137.12.3 255.255.255.0ip helper-address 10.136.2.8As noticed above, the ip helper-address command is also supported on SVIs belonging to a specific VRF. This means that the switch is capable of performing a lookup for the DHCP server IP address in the right VRF routing table and of properly relaying to it the DHCP request.
Note that what was described above does not actually mean that the DHCP relay functionality on Catalyst 6500 platforms is VRF-aware; to achieve VRF-awareness, the switch should be able to include VPN-specific information in the message sent to the centralized DHCP server. This would for example allow the centralized DHCP server (assuming the server is VRF-aware as well) to provide IP addresses from overlapping IP pools belonging to separate VRFs. VRF-awareness for DHCP-relay functionality is currently not supported on any Catalyst platform, but it is required only for supporting overlapping IP addresses. In designs where this overlapping IP addresses requirement is not present, it is possible to leverage the currently available DHCP-Relay functionality, which is available on all Catalyst platforms.
Multicast
The VRF awareness for multicast implies the capability of virtualizing the protocols and data structures leveraged for multicast deployments: multicast routing table, PIM process, IGMP capabilities, RP discovery mechanisms, etc. More details on integration of multicast in a virtualized network can be found in following sections of this document when discussing the specific technologies that can be used to provide path isolation across the campus infrastructure.
QoS
QoS and network virtualization are currently orthogonal problems. Enabling VRF capabilities allows the creation of a separate control and data plane for the switch. However, there are no virtualization capabilities from a queuing perspective. This means that, for example, if traffic is classified at the edge and marked as EF, it makes use of the priority queue (if defined) independently of the origination of the VPN. Usually the distribution layer switches require the following QoS policies:
•
DSCP trust policies—These are usually enabled on all the interfaces of the distribution layer device. Adding virtualization to the design does not change this requirement.
•
Queuing policies—As already mentioned, queuing of the traffic is based on how the packets are classified and marked at the edge of the network. This is independent of the fact that these packets belong to a VPN or to global table (no VRF awareness is supported today for the queuing mechanism).
•
Optional per-user microflow policing policies—Catalyst 6500s with PFC3 support user-based rate limiting (UBRL). UBRL is a form of Microflow policing allowing the administrator to rate limit traffic flows, but unlike a normal Microflow policer, it allows a policer to be applied to all traffic to or from a specific user. This is independent of the VPN to which the user belongs, because the policing is usually applied on the trunk interface connecting the distribution block to the access layer device. UBRL functionality is not currently VRF-aware (that is, it is not possible to differentiate traffic from users having the same IP address but belonging to different VPNs).
Routed ACLs
Standard and extended ACLs are usually applied as routed ACLs (RACLs) at the first Layer 3 hop of the network and have been made VRF-aware since release 12.2(18)SXD for Catalyst 6500 and 12.2(44)SG for Catalyst 4500. This means they can be successfully applied to Layer 3 interfaces (usually SVIs) that are part of a specific VRF.
Troubleshooting Tools
Several troubleshooting tools can be used on the virtualized devices to verify proper connectivity for each defined virtual network. More details can be found in the sections discussing the specific path isolation technologies.
Path Isolation Deploying VRF-Lite and GRE
Connectivity Requirements
This particular solution is recommended in cases where there is a requirement for connectivity of many-to-one. This is most likely the scenario for applications such as guest access or NAC remediation, where the traffic originated on the edge of the network (campus buildings or branch offices) must be gathered to a centralized location (represented by the enterprise Internet edge or by the data center where a remediation server can be deployed).
In such scenarios, a hub-and-spoke topology is the recommended design. In a campus network, GRE tunnels can be used to transport the guest VLAN traffic from the first Layer 3 hop to a hub location, which is typically the Internet DMZ for an enterprise network. By placing the guest VLAN subnet (SVI) and the GRE interface into a VRF, you can separate the IP address space and routing from the rest of the enterprise network. Note that VRFs have to be defined only on the GRE tunnel endpoints (hub-and-spoke devices). One of the benefits of using GRE tunnels is that they can traverse multiple Layer 3 hops, but the VRF configuration is required only at the tunnel edges of the network.
A solution using GRE tunnels as a mechanism to segment the guest traffic has platform capability limitations. Table 2 provides a comparison of the GRE tunneling capabilities offered by the various Cisco switching platforms.
The information presented in Table 2 limits the applicability of this solution, depending on the specific Catalyst switches in place:
•
In traditional designs, where the first Layer 3 hop is represented by the distribution layer devices, this approach is recommended when deploying a Catalyst 6500 with Sup720 or Sup32, because of the hardware-switching capability offered on these platforms. An exception to this recommendation can be for applications that do not require a large amount of bandwidth (such as guest access, where you might not want to provide large bandwidth). In that case, designs implementing the Catalyst 4500 in the distribution layer might be a candidate for this network virtualization solution. However, when originating (or terminating) GRE tunnels on a Catalyst 4500, it is a good practice to rate-limit the amount of GRE traffic that is allowed, to protect the CPU. More details on the configuration required for this are provided in QoS in Hub-and-Spoke Deployments.
•
In routed access designs, where the demarcation line between Layer 2 and Layer 3 is moved down to the access layer, there are the following two scenarios:
–
The access layer contains deployed devices that support GRE (such as a Catalyst 6500 or 4500). In this case, GRE tunnels can be originated directly from the access layer devices, keeping in mind the bandwidth implications previously described when deploying platforms that do not support GRE in hardware.
–
The access layer contains deployed devices that do not support GRE (such as Catalyst 3xxx). In this scenario, GRE tunnels can be originated only from the distribution layer (assuming the platforms deployed there are GRE capable). As a result, some other mechanism should be deployed to maintain the logical separation of traffic for different user groups between the access and distribution layers. One possible way to achieve this is to use VRF-Lite End-to-End between access and distribution devices. For more information on VRF-Lite End-to-End deployments, see Path Isolation Deploying VRF-Lite End-to-End.
In addition to the considerations about GRE support, it is also important to keep in mind that the support of VRF-Lite on Catalyst switches does not currently come with the IP base software license. The list below clarifies the minimum license required on different switch models:
•
Catalyst 6500: IP Services
•
Catalyst 4K: IP Services
•
Catalyst 3K: IP Services
Figure 17 shows the definition of various VRFs on the distribution layer device, with the corresponding mapping to the VRF for the VLANs defined on the Layer 2 domain of the network and the GRE tunnels part of the Layer 3 domain.
Figure 17 VRF-Lite and GRE
The diagram in Figure 17 is valid for both traditional and routed access designs when GRE tunnels are originated on the distribution layer switches. When deploying routed access designs where GRE tunnels can be originated from the access layer devices, the only difference is the absence of the trunk connection on the left, because each switch port is mapped to a specific VLAN.
To deploy end-to-end network virtualization across the network, a mapping between VLANs to VRFs and then VRFs to GRE on one side, as well as between the GRE tunnel interfaces and VRFs on the other side is required. The next two paragraphs provide a more detailed description of the configuration required to implement this form of traffic isolation.
Configuration Details
This section describes two options to build logical overlay networks using GRE and VRF. The first approach uses point-to-point GRE connections between devices, and the second one introduces the use of mGRE interfaces. The use of mGRE technology is particularly suited for applications requiring hub-and-spoke connectivity, as described in this section.
Using Point-to-Point GRE
The traditional configuration for GRE tunnels requires the creation of point-to-point tunnel interfaces on both sides of the tunnel. When building a hub-and-spoke topology, the use of point-to-point GRE tunnels requires that you to create a separate logical interface on the hub switches every time a new spoke needs to be added. This is both configuration-intensive and router resource-intensive. To address the performance considerations, Cisco recommends using a Catalyst 6500 with a Supervisor 720 that has GRE support in hardware. To address the configuration challenges associated with supporting multiple GRE tunnels at the hub site, an alternative network design based on mGRE and Next Hop Resolution Protocol (NHRP) is introduced. However, in some cases, point-to-point GRE might be the only option because mGRE and NHRP are not supported on all platforms (for example, they are not supported on Catalyst 4500 switches).
The following configuration steps accompany the network diagram shown in Figure 18. Keep in mind the following considerations when considering the required configuration:
•
The example is valid for a guest access application, so point-to-point GRE tunnels are defined between a generic spoke device and the centralized hub in the Internet edge. Also, traffic is originated from guest subnets defined at the edge of the network (spokes).
•
The configuration sample refers to the traditional campus design, so VRF and GRE are defined on the distribution layer devices.
•
Catalyst 6500 switches are deployed as spoke and hub devices. The Catalyst 4500 is also a viable alternative for applications not requiring high throughput.
•
It is assumed that all traffic directed to the Internet is sent to an undefined next hop device. Depending on the specific application, this device might be an appliance, such as a firewall or a router.
Figure 18 Hub-and-Spoke with Point-to-Point GRE Tunnels
Note
The following configuration sections assume that basic network connectivity (for example, in the global routing table) is already in place in the network.
Hub GRE Configuration
On each hub device, a separate tunnel (and corresponding loopback) interface is required for each spoke switch. In the previous example, there are four spokes devices, representing the two pairs of distribution layer switches for two campus buildings.
Note
The configuration samples in the following sections refer specifically to a guest access deployment. However, they are also valid for all applications requiring hub-and-spoke connectivity.
ip vrf guestrd 100:1!interface Loopback0description src GRE p2p tunnel 1ip address 10.122.200.1 255.255.255.255!interface Loopback1description src GRE p2p tunnel 2ip address 10.122.200.2 255.255.255.255!interface Loopback2description src GRE p2p tunnel 3ip address 10.122.200.3 255.255.255.255!interface Loopback3description src GRE p2p tunnel 4ip address 10.122.200.4 255.255.255.255!interface Tunnel0description GRE p2p tunnel 1ip vrf forwarding guestip address 172.32.1.1 255.255.255.252tunnel source Loopback0tunnel destination 10.122.210.1!interface Tunnel1description GRE p2p tunnel 2ip vrf forwarding guestip address 172.32.1.5 255.255.255.252tunnel source Loopback1tunnel destination 10.122.210.2!interface Tunnel2description GRE p2p tunnel 3ip vrf forwarding guestip address 172.32.1.9 255.255.255.252tunnel source Loopback2tunnel destination 10.122.210.3!interface Tunnel3description GRE p2p tunnel 4ip vrf forwarding guestip address 172.32.1.13 255.255.255.252tunnel source Loopback3tunnel destination 10.122.210.4Note that each tunnel interface is mapped to the guest VRF using the ip vrf forwarding command, which is the key starting point in building the overlay logical network. The use of VRF allows great flexibility when planning the IP addressing for the guest subnets. In the preceding example, the overlay logical network is using a 172.16.0.0 address space, whereas all the addresses used in the global table (loopback interfaces, and so on) are part of the 10.0.0.0/8 subnet. This means that the IP addresses assigned to each defined user group can be independently selected from the block of addresses associated to that specific building block in the global table. Overlapping IP address space is also supported on different VRFs. For example, network 10.1.1.0/24 can exist in multiple VRFs in multiple locations.
The addresses to be used for the loopback interfaces used as source and destination of the GRE traffic should be carefully selected to avoid the creation of routing black holes. See Loopback Interfaces Deployment Considerations for more information on this subject.
Spoke GRE Configuration
The configuration required on each spoke is very similar to the one described previously: two tunnel interfaces are configured to connect to the pair of redundant hub devices in the Internet edge block. Referring to Figure 18, the following configuration sample is valid for one of the two spokes in Building 1:
ip vrf guestrd 100:1!interface Loopback0description src GRE tunnel to hub-1ip address 10.122.210.1 255.255.255.255!interface Loopback1description src GRE tunnel to hub-2ip address 10.122.211.1 255.255.255.255!interface Tunnel0description GRE tunnel to hub-1ip vrf forwarding guestip address 172.32.1.2 255.255.255.252tunnel source Loopback0tunnel destination 10.122.200.1!interface Tunnel1description GRE tunnel to hub-2ip vrf forwarding guestip address 172.32.2.2 255.255.255.252tunnel source Loopback1tunnel destination 10.122.201.1Again, the logical tunnel interfaces must be mapped to the VRF to force the Internet-bound guest traffic into the GRE tunnel that carries the traffic to the Internet edge at the hub site. All the guest traffic originates from users deployed in a dedicated guest VLAN (at least for wired users, as previously described). To maintain an end-to-end segregation of guest traffic, the corresponding VLAN interface (logical SVI) must also be mapped to the guest VRF, as shown in the following configuration sample.
Note
Typical deployments have more than one guest VLAN defined for each campus distribution block. In this case, all the corresponding VLAN interfaces must be mapped to the same VRF.
interface Vlan11description Wired Guest subnetip vrf forwarding guestip address 172.16.11.2 255.255.255.0ip helper-address 172.18.2.10standby 11 ip 172.16.11.1standby 11 timers msec 250 msec 800standby 11 priority 105standby 11 preempt delay minimum 180
Note
HSRP (or any other redundant gateway protocol) is relevant when deploying traditional campus designs, where the demarcation line between Layer 2 and Layer 3 is placed in the distribution layer switches. HSRP is not needed in a routed access scenario.
Virtualizing the Routing Protocol
When the VRFs identifying the same user group have been linked together by GRE tunnels (creating the logical overlay network), it is time to start entering routing information into the routing tables for each defined group. The easiest way to do this is through static routing; a default static route pointing to the hubs can be configured on each spoke device. In this way, all the traffic originating, for example, from the guest subnets and directed to the Internet is GRE-encapsulated and conveyed toward the enterprise Internet edge.
The use of static routing also requires the configuration of specific static routes on the hub to allow return traffic directed to the edge subnets. Introducing a dynamic routing protocol in the overlay network brings the following two main advantages:
•
The routing updates serve as keepalives for the GRE tunnels. The devices use the GRE interfaces to send traffic only if valid routing information is received, which ensures network connectivity across the tunnel.
•
When supporting redundant GRE uplinks, load balancing of traffic and resiliency are automatically achieved by using the routing protocol characteristics.
The configuration details for how to enable a routing protocol in the context of a specific VRF differ, depending on the chosen protocol. Some routing protocols (such as EIGRP and BGP) introduce the concept of address families. The idea is to have a single routing process running on the device and to define a separate address family that is mapped to each VRF. Other routing protocols (such as OSPF) allow a different routing process for each VRF to be created.
This guide considers EIGRP and OSPF because they are the most common routing protocols found in enterprise networks. The following configuration samples refer to the same network diagram shown in Figure 18.
Note
The routing protocol enabled in the context of each VRF is totally independent from the IGP running in the other VRFs or in the global routing table.
EIGRP
To run EIGRP in the context of a VRF, the VRF-specific address family needs to be configured. The configuration is slightly different for hub-and-spoke because the hubs must also advertise a default route to the spokes. Because of this default route, all the traffic that originates from the edge subnets is forced to the hubs.
The static default route configured on the hub is pointing to the next hop device shown in Figure 18.
•
Hub
ip route vrf guest 0.0.0.0 0.0.0.0 172.18.1.30!router eigrp 100passive-interface defaultno passive-interface Tunnel0no passive-interface Tunnel1no passive-interface Tunnel2no passive-interface Tunnel3no auto-summary!address-family ipv4 vrf guestredistribute static metric 1000000 500 255 1 1500network 172.32.1.0 0.0.0.255no auto-summaryautonomous-system 100exit-address-familySpokerouter eigrp 100passive-interface defaultno passive-interface Tunnel0no passive-interface Tunnel1no auto-summary!address-family ipv4 vrf guestnetwork 172.16.100.0 0.0.0.255network 172.16.200.0 0.0.0.255network 172.16.11.0 0.0.0.255no auto-summaryautonomous-system 100exit-address-familyThis design is resilient because each spoke receives a redundant default route to each of the hubs located in the Internet edge. Each hub learns the guest subnets from each spoke. Note how, by default, the spoke learns not only the default routes from each hub, but also the subnet information of other guests.
•
Spoke
6500-1-Bldg#sh ip route vrf guestRouting Table: guestCodes: C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter areaN1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGPi - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static routeGateway of last resort is 172.32.1.1 to network 0.0.0.0172.17.0.0/24 is subnetted, 1 subnetsD 172.17.11.0 [90/310044672] via 172.32.1.1, 00:00:35, Tunnel0[90/310044672] via 172.32.2.1, 00:00:35, Tunnel1172.16.0.0/24 is subnetted, 1 subnetsC 172.16.11.0 is directly connected, Vlan11172.32.0.0/30 is subnetted, 8 subnetsD 172.32.1.12 [90/310044416] via 172.32.1.1, 00:00:47, Tunnel0D 172.32.2.12 [90/310044416] via 172.32.2.1, 00:00:35, Tunnel1D 172.32.1.8 [90/310044416] via 172.32.1.1, 00:00:53, Tunnel0D 172.32.2.8 [90/310044416] via 172.32.2.1, 00:00:44, Tunnel1D 172.32.1.4 [90/310044416] via 172.32.1.1, 00:00:48, Tunnel0D 172.32.2.4 [90/310044416] via 172.32.2.1, 00:00:41, Tunnel1C 172.32.1.0 is directly connected, Tunnel0C 172.32.2.0 is directly connected, Tunnel1D*EX 0.0.0.0/0 [170/297372416] via 172.32.1.1, 00:00:55, Tunnel0[170/297372416] via 172.32.2.1, 00:00:55, Tunnel1To get to a situation where the spokes have only the default routes in their routing tables, some additional configuration is required. For example, it is possible to apply an outbound filter on the hub to advertise only the default route toward each spoke. This is achieved by the following configuration:
ip access-list standard default-onlypermit 0.0.0.0!router eigrp 100address-family ipv4 vrf guestdistribute-list default-only outThe result of this configuration on the spoke routing table is as follows:
6500-1-Bldg#sh ip route vrf guestRouting Table: guestCodes: C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter areaN1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGPi - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static routeGateway of last resort is 172.32.2.1 to network 0.0.0.0172.16.0.0/24 is subnetted, 1 subnetsC 172.16.11.0 is directly connected, Vlan11172.32.0.0/30 is subnetted, 2 subnetsC 172.32.1.0 is directly connected, Tunnel0C 172.32.2.0 is directly connected, Tunnel1D*EX 0.0.0.0/0 [170/297372416] via 172.32.2.1, 00:00:32, Tunnel1[170/297372416] via 172.32.1.1, 00:00:32, Tunnel0Differently from the spokes, to be able to properly route return traffic, the two hubs must contain information about all the guest subnets that are deployed in the campus in their routing tables. Referring to the example in Figure 18, the routing table on each hub device appears like the following example.
6500-Int-1#sh ip route vrf guestRouting Table: guestCodes: C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter areaN1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGPi - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static routeGateway of last resort is 172.18.1.30 to network 0.0.0.0172.17.0.0/24 is subnetted, 1 subnetsD 172.17.11.0 [90/15360256] via 172.32.10.4, 00:01:10, Tunnel2[90/15360256] via 172.32.10.5, 00:01:10, Tunnel3172.16.0.0/24 is subnetted, 1 subnetsD 172.16.11.0 [90/15360256] via 172.32.10.2, 00:00:10, Tunnel0[90/15360256] via 172.32.10.3, 00:00:10, Tunnel1 172.18.0.0/24 is subnetted, 1 subnetsC 172.18.1.0 is directly connected, Vlan181172.32.0.0/30 is subnetted, 8 subnetsC 172.32.1.12 is directly connected, Tunnel3D 172.32.2.12 [90/310044416] via 172.32.1.14, 00:02:15, Tunnel3C 172.32.1.8 is directly connected, Tunnel2D 172.32.2.8 [90/310044416] via 172.32.1.10, 00:02:30, Tunnel2C 172.32.1.4 is directly connected, Tunnel1D 172.32.2.4 [90/310044416] via 172.32.1.6, 00:02:44, Tunnel1C 172.32.1.0 is directly connected, Tunnel0D 172.32.2.0 [90/310044416] via 172.32.1.2, 00:02:56, Tunnel0S* 0.0.0.0/0 [1/0] via 172.18.1.30As shown, each hub has a redundant path to the route aggregate advertised from each building block. As a result, even if each spoke has no knowledge of the other guest subnets defined across the campus network, communication between them is still possible because the hub has the information to route these packets in its routing table. The advantage in building this hub-and-spoke overlay network is that policy enforcement to deny communications between guest subnets defined in separate campus buildings can be centralized on the two hub devices, and it is not required to be distributed on each spoke at the edge of the network.
To limit communication between guest subnets defined in the same campus building, the policy needs to be applied on the first Layer 3 hop device, represented by the distribution layer switch (for traditional Layer 2/Layer 3 campus designs) or by the access layer switch (for the routed access campus design).
OSPF
Differently from EIGRP, there is no concept of an address family in OSPF. To enable OSPF in the context of a VRF, you must define a new process and bind it to the specific VRF:
•
Hub
ip route vrf guest 0.0.0.0 0.0.0.0 172.18.1.30!router ospf 1 vrf guestlog-adjacency-changespassive-interface defaultno passive-interface Tunnel0no passive-interface Tunnel1no passive-interface Tunnel2no passive-interface Tunnel3network 172.32.1.0 0.0.0.255 area 0default-information originate•
Spoke
router ospf 1 vrf guestlog-adjacency-changespassive-interface defaultno passive-interface Tunnel0no passive-interface Tunnel1network 172.16.11.0 0.0.0.255 area 16network 172.32.1.0 0.0.0.255 area 0network 172.32.2.0 0.0.0.255 area 0As described for EIGRP, the configuration causes the spoke routers to have information about all the guest subnets in their routing table, as in the following example:
6500-1-Bldg#sh ip route vrf guestRouting Table: guestCodes: C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter areaN1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGPi - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static routeGateway of last resort is 172.32.2.1 to network 0.0.0.0172.17.0.0/24 is subnetted, 1 subnetsO 172.17.11.0 [110/22223] via 172.32.1.1, 00:00:05, Tunnel0[110/22223] via 172.32.2.1, 00:00:05, Tunnel1172.16.0.0/24 is subnetted, 1 subnetsC 172.16.11.0 is directly connected, Vlan11172.32.0.0/30 is subnetted, 8 subnetsO 172.32.1.12 [110/22222] via 172.32.1.1, 00:00:05, Tunnel0O 172.32.2.12 [110/22222] via 172.32.2.1, 00:00:05, Tunnel1O 172.32.1.8 [110/22222] via 172.32.1.1, 00:00:06, Tunnel0O 172.32.2.8 [110/22222] via 172.32.2.1, 00:00:06, Tunnel1O 172.32.1.4 [110/22222] via 172.32.1.1, 00:00:06, Tunnel0O 172.32.2.4 [110/22222] via 172.32.2.1, 00:00:07, Tunnel1C 172.32.1.0 is directly connected, Tunnel0C 172.32.2.0 is directly connected, Tunnel1O*E2 0.0.0.0/0 [110/1] via 172.32.2.1, 00:00:07, Tunnel1[110/1] via 172.32.1.1, 00:00:07, Tunnel0Similarly to the EIGRP example, it is possible to apply a distribute list statement to eliminate these routes from the spoke devices and to import only a default route. In the OSPF scenario, this filter should be configured on each spoke (and not on the hub) because each router configured for OSPF must maintain a common topology database. This is achieved with the following configuration:
ip access-list standard default-onlypermit 0.0.0.0!router ospf 1 vrf guestdistribute-list default-only inAs a result of this configuration, the spoke eventually learns (in the routing table) only a default route pointing to the Internet edge, as follows:
6500-1-Bldg#sh ip route vrf guestRouting Table: guestCodes: C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter areaN1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGPi - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static routeGateway of last resort is 172.32.2.1 to network 0.0.0.0172.16.0.0/24 is subnetted, 1 subnetsC 172.16.11.0 is directly connected, Vlan11172.32.0.0/30 is subnetted, 2 subnetsC 172.32.1.0 is directly connected, Tunnel0C 172.32.2.0 is directly connected, Tunnel1O*E2 0.0.0.0/0 [110/1] via 172.32.2.1, 00:00:23, Tunnel1[110/1] via 172.32.1.1, 00:00:23, Tunnel0From the point of view of the hub, the routing table appears similar to the EIGRP scenario. The hub has knowledge of all the guest subnets defined around the campus, so some centralized policy configuration might be required to prevent inter-guest communications.
6500-Int-1#sh ip route vrf guestRouting Table: guestCodes: C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter areaN1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGPi - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static routeGateway of last resort is 172.18.1.30 to network 0.0.0.0172.17.0.0/24 is subnetted, 1 subnetsO IA 172.17.11.0/24 [110/11112] via 172.32.1.10, 00:03:08, Tunnel2[110/11112] via 172.32.1.14, 00:03:08, Tunnel3172.16.0.0/24 is subnetted, 1 subnetsO IA 172.16.11.0/24 [110/11112] via 172.32.1.2, 00:03:08, Tunnel0[110/11112] via 172.32.1.6, 00:03:08, Tunnel1172.18.0.0/24 is subnetted, 1 subnetsC 172.18.1.0 is directly connected, Vlan181172.32.0.0/30 is subnetted, 8 subnetsC 172.32.1.12 is directly connected, Tunnel3O 172.32.2.12 [110/22222] via 172.32.1.14, 00:03:08, Tunnel3C 172.32.1.8 is directly connected, Tunnel2O 172.32.2.8 [110/22222] via 172.32.1.10, 00:03:08, Tunnel2C 172.32.1.4 is directly connected, Tunnel1O 172.32.2.4 [110/22222] via 172.32.1.6, 00:03:12, Tunnel1C 172.32.1.0 is directly connected, Tunnel0O 172.32.2.0 [110/22222] via 172.32.1.2, 00:03:12, Tunnel0S* 0.0.0.0/0 [1/0] via 172.18.1.30
Note
Cisco does not recommend applying a distribute list statement on the spoke devices, because doing so causes a discrepancy between the content of the topology database and the routing table.
Using mGRE Technology
When compared to the point-to-point GRE scenario described in the previous section, the use of mGRE interfaces on the hub switches has several advantages:
•
Simplified configuration on the hub—Only one loopback and one tunnel interface are required, instead of configuring a pair for each spoke device. To connect to multiple edge devices, the tunnel interface works in mGRE mode.
•
Dynamic addition of spoke devices—New spokes can be added without requiring any configuration changes on the hub devices.
•
Simplified IP addressing—The overlay logical mGRE network is part of a single IP subnet and many distinct point-to-point subnets are not required for each GRE spoke tunnel.
At the same time, an additional mechanism is needed, NHRP, to allow the hub devices to dynamically discover the spokes and establish GRE tunnels with them. NHRP, as defined in RFC 2332, is a Layer 2 address resolution protocol and cache, similar to Address Resolution Protocol (ARP) and Frame Relay inverse-ARP. When a tunnel interface is an mGRE, NHRP tells the mGRE process where to tunnel a packet to reach a certain address. NHRP is a client-server protocol where the hub is the server and the spokes are the clients. The hub maintains an NHRP database where it registers each spoke, the mapping between the physical address (used as GRE tunnel destination), and the logical address assigned to the spoke tunnel interface. Each spoke provides this information to the hub, sending an NHRP registration message at startup time.
Note
Support for NHRP in the context of a VRF is restricted to Catalyst 6500 platforms with Sup720 and Sup32 running software release 12.2(18)SXE and later. This implies that to deploy the solution described in this section, these devices must both be deployed at the hub-and-spoke locations. Also, NHRP support is limited to Catalyst 6500 running Advanced IP Services and greater licenses
Following are the configuration steps required for creating the hub-and-spoke overlay network using mGRE interfaces on the hub devices (see Figure 19).
Figure 19 Hub-and-Spoke Using mGRE Technology
Similarly to the point-to-point scenario, the following considerations are valid in this instance:
•
The example is valid for a guest access application, so point-to-point GRE interfaces are defined for each spoke device, whereas mGRE is used on the centralized hub in the Internet edge. Also, traffic is originated from guest subnets defined at the edge of the network (spokes).
•
The configuration sample refers to the traditional campus design, so VRF and GRE are defined on the distribution layer devices.
•
Catalyst 6500 switches are deployed both as spoke and hub devices. Catalyst 4500s are not a viable alternative for this design because of the lack of support of NHRP in the context of the VRF.
Hub mGRE Configuration
The configuration required to create an mGRE interface on the hub and enable the NHRP functionality is as follows:
ip vrf guestrd 100:1!interface Loopback10description src mGRE tunnel for Guestip address 10.122.200.10 255.255.255.255!interface Tunnel10description mGRE tunnel for Guestip vrf forwarding guestip address 172.32.10.1 255.255.255.0no ip redirectsip nhrp map multicast dynamicip nhrp network-id 10tunnel source Loopback10tunnel mode gre multipointNHRP is enabled on the mGRE interface using the ip nhrp network-id command. The value specified must match the one configured on the spoke devices. Also, the ip nhrp map multicast dynamic command is required to enable dynamic routing protocols to work over the mGRE tunnel when IGP routing protocols use multicast packets. The dynamic keyword prevents the hub device from needing a separate configuration line for a multicast mapping for each spoke router. This is important because the goal is to avoid any reconfiguration of the hub devices when adding a new spoke component.
Spoke GRE Configuration
The configuration of the spoke devices is almost identical to the one previously described for the point-to-point scenario. The only difference is the addition of the NHRP-related commands:
ip vrf guestrd 100:1!interface Loopback10description src GRE tunnel for Guest to hub-1ip address 10.122.210.10 255.255.255.255!interface Loopback11description src GRE tunnel for Guest to hub-2ip address 10.122.211.10 255.255.255.255!interface Tunnel10description GRE tunnel for Guest to hub-1ip vrf forwarding guestip address 172.32.10.2 255.255.255.0ip nhrp network-id 10ip nhrp nhs 172.32.10.1ip nhrp registration timeout 60tunnel source Loopback10tunnel destination 10.122.200.10!interface Tunnel11description GRE tunnel for Guest to hub-2ip vrf forwarding guestip address 172.32.11.2 255.255.255.0ip nhrp network-id 11ip nhrp nhs 172.32.11.1ip nhrp registration timeout 60tunnel source Loopback11tunnel destination 10.122.201.10Similarly to the hub case, the ip nhrp network-id command is used to enable the NHRP process on the tunnel interfaces (the values specified must match the values configured on the two hubs). In addition to that, the ip nhrp nhs command is required to specify the address of the NHRP server (hub). Finally, the ip nhrp registration timeout command is required to tune the frequency (in seconds) at which the spokes send the NHRP registration messages to the hubs. This command is required to allow a spoke to re-register in case the connectivity with the hub is interrupted and restored, which occurs every 2400 seconds by default.
Note
The ip nhrp map multicast command is not required on the spoke devices because the tunnel interface is point-to-point, so all multicast packets are automatically sent to the other end (hub).
As described in Using Point-to-Point GRE, a mapping from the logical VLAN interface defining the guest subnets and the guest VRF is also required:
interface Vlan11description Wired Guest subnetip vrf forwarding guestip address 172.16.11.2 255.255.255.0standby 11 ip 172.16.11.1standby 11 timers msec 250 msec 800standby 11 priority 105standby 11 preempt delay minimum 180Verifying the NHRP Information
After configuring the tunnel interfaces on the hub-and-spoke, it should be possible to verify that the hub is receiving the NHRP registration message from the spoke device, therefore adding dynamic entries to the NHRP cache:
6500-Int-1#sh ip nhrp172.32.10.2/32 via 172.32.10.2, Tunnel10 created 00:01:52, expire 01:59:05Type: dynamic, Flags: authoritative unique registeredNBMA address: 10.122.210.10172.32.10.3/32 via 172.32.10.3, Tunnel10 created 00:01:03, expire 01:59:54Type: dynamic, Flags: authoritative unique registered usedNBMA address: 10.122.210.11172.32.10.4/32 via 172.32.10.4, Tunnel10 created 00:00:33, expire 01:59:26Type: dynamic, Flags: authoritative unique registeredNBMA address: 10.122.210.12172.32.10.5/32 via 172.32.10.5, Tunnel10 created 00:00:06, expire 01:59:56Type: dynamic, Flags: authoritative unique registered usedNBMA address: 10.122.210.13As shown in the previous configuration sample, the hub learns the physical, non-broadcast multiaccess address (NBMA) used to tunnel GRE packets destined to the spoke. This information is refreshed by the spoke with NHRP registration messages every 60 seconds (because of the tuning done with the ip nhrp registration timeout command). The default expiration time (hold time) is 7200 seconds (two hours) as noted on the right side in this example (expire 01:59:05). Under normal circumstances, this value should never go below 01:59:00, because it is re-initialized by the receipt of NHRP registration messages every 60 seconds.
Virtualizing the Routing Protocol
From a topology perspective, the routing protocol runs only between the spoke router and one or more hub devices. The solution implementing mGRE interfaces has been tested with EIGRP and OSPF because they are the most commonly deployed choices among enterprise customers.
When the connection of the spoke to the network comes up, it is ready to begin transmitting routing protocol information because the tunnel interface is configured as point-to-point. On the other side, the hub device cannot begin sending routing protocol information until NHRP registrations arrive from each spoke device and the NHRP cache gets populated.
Consider the following when configuring routing protocols in this scenario:
•
GRE tunnel bandwidth—The default bandwidth of a GRE tunnel is 9 Kbps, which has two unwanted consequences:
–
Any routing protocol using bandwidth as a metric is being given misleading information, which can cause unpredictable results.
–
Cisco EIGRP assigns half of this bandwidth for the use of the routing protocol, which most likely is insufficient.
Cisco recommends configuring the bandwidth parameter on GRE tunnel interfaces to the actual bandwidth available on the link.
•
IP maximum transmission unit (MTU)—It is important, especially when using OSPF, to verify that the IP MTU settings match on the tunnel interfaces on both sides of the link. The MTU value recommended here is 1400 bytes, which leaves room for GRE and IPsec overhead (if needed) and avoids packet fragmentation. More information on this topic can be found in Verifying the NHRP Information.
•
OSPF interface types and priority—In the hub-and-spoke topology previously described, the mGRE tunnel interface is considered point-to-point from an OSPF standpoint. Because the same interface starts receiving hellos and OSPF packets from different spokes, this prevents the establishment of adjacencies. To fix the problem, configure the OSPF network type as broadcast on both the hubs and all the spokes. Also, set the OSPF priority to 0 on the spokes to guarantee that the hubs become the designated router (DR) and the backup designated router (BDR).
Based on these considerations, the configuration of the generic hub-and-spoke GRE interfaces needs to be changed as follows. (This configuration sample is also valid for OSPF.)
•
Hub
interface Tunnel10description mGRE tunnelbandwidth 1000ip vrf forwarding guestip address 172.32.10.1 255.255.255.0no ip redirectsip mtu 1400ip nhrp map multicast dynamicip nhrp network-id 100ip ospf network broadcasttunnel source Loopback0tunnel mode gre multipoint•
Spoke
interface Tunnel10description GRE tunnel for Guest to hub-1bandwidth 1000ip vrf forwarding guestip address 172.32.10.2 255.255.255.0ip mtu 1400ip nhrp network-id 10ip nhrp nhs 172.32.10.1ip nhrp registration timeout 60ip ospf network broadcastip ospf priority 0tunnel source Loopback10tunnel destination 10.122.200.10!interface Tunnel11description GRE tunnel for Guest to hub-2bandwidth 1000ip vrf forwarding guestip address 172.32.11.2 255.255.255.0ip mtu 1400ip nhrp network-id 11ip nhrp nhs 172.32.11.1ip nhrp registration timeout 60ip ospf network broadcastip ospf priority 0tunnel source Loopback11tunnel destination 10.122.201.10The configuration required to enable the routing protocols in the context of the guest VRF is identical to that described in the point-to-point scenario.
The only difference in this case is the fact that the hub devices learn all the routes for the guest subnets out of the same mGRE interface. Because of the additional information contained in the NHRP cache, the hubs are able to route back the traffic to the proper spokes (see the following sample output for an EIGRP example).
6500-Int-1#sh ip route vrf guestRouting Table: guestCodes: C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter areaN1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGPi - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static routeGateway of last resort is 172.18.1.30 to network 0.0.0.0172.17.0.0/24 is subnetted, 1 subnetsD 172.17.11.0 [90/15360256] via 172.32.10.4, 00:01:10, Tunnel10[90/15360256] via 172.32.10.5, 00:01:10, Tunnel10172.16.0.0/24 is subnetted, 1 subnetsD 172.16.11.0 [90/15360256] via 172.32.10.2, 00:00:10, Tunnel10[90/15360256] via 172.32.10.3, 00:00:10, Tunnel10 172.18.0.0/24 is subnetted, 1 subnetsC 172.18.1.0 is directly connected, Vlan181172.32.0.0/16 is variably subnetted, 10 subnets, 2 masksC 172.32.1.12/30 is directly connected, Tunnel3D 172.32.2.12/30 [90/310044416] via 172.32.10.5, 00:00:38, Tunnel10C 172.32.1.8/30 is directly connected, Tunnel2C 172.32.10.0/24 is directly connected, Tunnel10D 172.32.2.8/30 [90/310044416] via 172.32.10.4, 00:00:39, Tunnel10D 172.32.11.0/24 [90/28160000] via 172.32.10.2, 00:00:35, Tunnel10[90/28160000] via 172.32.10.5, 00:00:35, Tunnel10[90/28160000] via 172.32.10.4, 00:00:35, Tunnel10[90/28160000] via 172.32.10.3, 00:00:35, Tunnel10C 172.32.1.4/30 is directly connected, Tunnel1D 172.32.2.4/30 [90/310044416] via 172.32.10.3, 00:00:35, Tunnel10C 172.32.1.0/30 is directly connected, Tunnel0D 172.32.2.0/30 [90/310044416] via 172.32.10.2, 00:00:40, Tunnel10S* 0.0.0.0/0 [1/0] via 172.18.1.30MTU Considerations
The use of GRE tunnels to create overlay logical networks can eventually cause MTU issues because of the increased size of the IP packets. The goal is to avoid IP fragmentation whenever possible, and to avoid all related issues. For more information, see the following URL: http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml.
Fragmentation at the endpoints of a TCP connection is avoided by the negotiation of TCP maximum segment size (MSS) performed by the same endpoint stations. However, TCP MSS cannot help in avoiding fragmentation happening in the path between the endpoints. This can be because of the existence of a smaller MTU link or, as it is in this case, because of the need to tunnel IP packets that can render their size larger than the original.
To deal with this problem, two approaches are described in this guide. The first is based on the use of Path MTU Discovery (PMTUD), which allows you to dynamically determine the lowest MTU along the path from a packet source to its destination. Hosts usually perform PMTUD by default by the Do Not Fragment (DF) bit being set in all the sourced TCP/IP packets. With the DF bit set, if a router along the path tries to forward an IP datagram to a link that has a lower MTU than the size of the packet, the router drops the packet and returns an Internet Control Message Protocol (ICMP) Destination Unreachable message to the source of this IP datagram, with the code indicating fragmentation needed and DF set (type 3, code 4). When the source station receives the ICMP message, it lowers the send message segment size (MSS), and when TCP retransmits the segment, it uses the smaller segment size. This process continues until the correct MSS to allow end-to-end communication is determined.
Given the fact that the use of PMTUD is limited to TCP flows and assumes that the endpoints are always able to receive the ICMP message and to act upon them, when deploying hub-and-spoke overlay networks in a campus environment, the recommended approach is to modify the MTU value for the interfaces of the network devices to allow them to handle IP packets larger than 1500 bytes.
The configuration requires two steps:
1.
Enable jumbo frame support on the physical interfaces for all the network devices deployed in campus, as shown in Figure 19.
Figure 20 Enable Jumbo Frame Support on Physical Interfaces
2.
Set the IP MTU for GRE interfaces (hubs and spokes) to 1500 (default value is 1476), as shown in Figure 20.
Figure 21 Increase MTU on GRE Interfaces
The corresponding required configuration steps are the following:
•
Enable Jumbo frames support on physical interfaces.
interface TenGigabitEthernet1/1mtu 9216•
Increse the MTU size on the logical GRE interfaces.
interface Tunnel11description GRE tunnel for Guest to hub-2ip mtu 1500Loopback Interfaces Deployment Considerations
Important design considerations arise when discussing the principles for deploying loopback interfaces that function as source and destination of the GRE tunnels. The following considerations are also valid for either point-to-point GRE or mGRE tunnel scenarios:
•
The loopback interfaces usually belong to the global routing table. Traffic belonging to different VRFs is logically isolated by mapping the logical tunnel interfaces to the corresponding VRFs.
•
It must be determined from which range to take the IP addresses assigned to the loopback interfaces. The assumption here is that a proper subnet planning is in place, so that a summarized route can be used in the core from each campus building block, as shown in Figure 22. Note that the IP addresses used in this example simplify the description and are not intended to represent a best practice summarization schema.
Note
The same considerations made in this section are generically valid every time loopback interfaces are configured on the distribution layer switches, not only when they are configured as tunnel source and destination.
Figure 22 IP Addressing Assignment in a Campus Network
Two options of assigning IP addresses to the loopback interfaces are as follows:
•
Assigning IP addresses from the same pool that is summarized toward the core of the network. This is the case in the example shown in Figure 23.
Figure 23 Assigning a Loopback Address From a Campus Building Pool
In this specific scenario, sending a network summary to the core can cause the creation of a black hole if the link between the two distribution switches fails. Because the core devices receive only the summary information, it is not possible to predict the return path for the traffic originated elsewhere in the network and destined to any IP address that is part of that summary. In the example in Figure 23, it can happen that GRE traffic directed to the Loopback 0 on the right distribution switch is actually routed from the core to the left distribution device. At this point, it is essential to have connectivity between the two distribution switches to avoid the creation of a black hole.
When following this approach, Cisco recommends that you increase the reliability of the connections between the distribution layer peers by connecting these devices with redundant physical links (at least two) belonging to different line cards (to avoid the single point of failure represented by the switch line card itself). This can increase the cost of the solution, especially in the scenario where 10 G links are in place between the distribution switches, but it also provides the additional bandwidth required when this connection becomes a transit link.
Note
Depending on the specific design, the two links might be bundled in a port channel (this is recommended when the connection is a Layer 2 trunk), or kept separate (if they are Layer 3 routed links).
•
Assigning IP addresses from a pool that is not included in the summary sent toward the core of the network.
This the recommended solution that does not present the caveat discussed above because each distribution switch advertises to the core the specific IP addresses used for the loopbacks. The drawback is that all the specific routing information for the loopbacks defined on each campus distribution block needs to be contained in the routing tables of all the other campus network devices.
Additional considerations around loopbacks deployment are required when OSPF is the chosen routing protocol to provide global table connectivity in the campus network. In this case, the distribution layer switches are usually deployed as Area Border Routers (ABRs); the interfaces toward the core are placed in area 0, whereas the transit link between the distribution peers is normally configured as part of the OSPF area deployed in that specific campus distribution block (this is done to have a redundant path between the ABRs inside the specific area and avoid sub-optimal intra-area routing in specific link failure scenarios).
For loopback interfaces, the recommendation is not to configure them as part of area 0; doing so may in fact lead to black-holing traffic whenever the ABR loses connectivity to the core devices and assumes it still has connectivity in area 0 because of the loopback. This may be the case when the ABR device is booting up (the loopback will always be up before any linecard has completed the boot-up process) or also if the linecard whose interface are used to connect to the core fails isolating the ABR from area 0 (typically interfaces on the same linecard are used to connect to the core).
In both scenarios, when any IP packet arrives at the distribution switch from the access layer and needs to be routed toward a remote subnet (could be because that switch is the active HSRP or due to the default route advertisement in a routed access deployment), the switch does not forward traffic over the transit intra-area link since it thinks to have local connectivity to the backbone (because of the loopback in area 0). Since the switch can not forward the packet directly via an area 0 link, it will drop it causing a temporary outage.
In summary, the following two design principles should be followed when deploying loopback interfaces at the distribution layer of the campus network:
•
Assigning IP addresses from a pool that is not included in the summary sent toward the core of the network.
•
When deploying OSPF for global table connectivity, do not configure the loopback interfaces as part of Area 0.
High Availability Considerations
The recommended design to provide resiliency in the hub-and-spoke scenario consists of implementing redundant hub devices and creating two separate hub-and-spoke networks, connecting the spokes to each hub, as shown in Figure 24.
Figure 24 Redundant Hub-and-Spoke Overlay Networks
Each spoke device builds a separate GRE tunnel destined to the redundant pair of hubs, traffic is load balanced between the two tunnels, and each spoke learns a default route with the same metric from each hub (as described in the previous section).
Note that the overall resiliency of the overlay solution is based on the resiliency of the network infrastructure. This can be achieved by following the recommended design guidelines in these documents:
•
http://www.cisco.com/application/pdf/en/us/guest/netsol/ns432/c649/cdccont_0900aecd801a8a2d.pdf
•
http://www.cisco.com/application/pdf/en/us/guest/netsol/ns432/c649/cdccont_0900aecd801a89fc.pdf
QoS in Hub-and-Spoke Deployments
Congestion inside a campus network is a rare event during normal operating conditions because of the large amount of available bandwidth. However, during an abnormal event, such as denial-of-service (DoS) or worm attacks, campus congestion can typically occur within minutes (even in 10 GE networks), as part of the collateral damage of such an attack. Therefore, classification and metering of traffic at the edge of the network is a valuable worm mitigation strategy. This strategy is even more relevant in scenarios where the enterprise does not usually have any control over the connected guest machines and therefore cannot enforce any security policies; for example, as when providing guest access. However, at enterprise branch locations, classification and metering of traffic becomes a priority, to achieve proper use of the bandwidth resources available across the WAN cloud.
The scenario described in this section relates to providing QoS for applications requiring hub-and-spoke connectivity; this is very relevant for GRE and VRF deployments and for the specific business problems they aim to solve (for example, guest access or for NAC remediation designs).
Two approaches are described in this section for classifying and handling traffic in hub-and-spoke deployments. The assumption is that there is the need to somehow limit the traffic originated from the edge of the network; for example, this is valid in guest access deployments.
The first approach strictly rate-limits the traffic originated from the subnets at the edge of the network, so that traffic exceeding a predetermined threshold is dropped and not allowed into the core of the network. The exact value to be used for the thresholds can vary from design to design. The goal of this section is to provide all the required tools to rate-limit traffic for both wired and wireless deployments. The main advantage of this approach is its simplicity, and also that its functionality is independent from the deployment of end-to-end QoS across the network.
The second approach is more dynamic and classifies the traffic at the edge and then prioritizes it inside the network. Again, the recommended strategy for classifying and marking the traffic is based on the definition of a specific threshold. Traffic within the threshold is treated as good faith, best effort traffic. Traffic exceeding the given allowance is marked as scavenger traffic and is aggressively dropped in the event of congestion. For these configuration examples, a threshold of 1 Mbps is used. However, note that this is just a sample value used for the configuration samples in this guide (the value of this threshold likely varies from enterprise to enterprise). Specific to these examples, traffic up to 1 Mbps is marked as best effort traffic (DSCP 0), whereas traffic that exceeds the threshold is marked as scavenger traffic (CS1 or DSCP 8).
The scavenger class of traffic was introduced to offer a less than best-effort service. Access layer policers mark out-of-profile traffic to CS1/DSCP8 (scavenger), and then have all congestion management policies provision a less than best-effort queuing service for this type of traffic. Traffic marked as scavenger starts being aggressively dropped whenever congestion occurs on campus or WAN edge links. If no congestion is experienced, the available bandwidth is successfully used. An approach based on the use of scavenger-class QoS is much more flexible and dynamic than a strict rate-limiting of traffic at the edge of the network. Additionally, it provides a worm mitigation strategy in cases where clients connected to the enterprise network become infected by a virus and the virus starts attacking the network infrastructure. To be effective, it is assumed that all the devices in the network have been configured with the proper Differentiated Services Code Point (DSCP) trust boundaries and queuing and dropping strategies. The configuration details to achieve this prioritization are beyond the scope of this guide. For more information on how to accomplish this for both campus and WAN scenarios, see the Enterprise QoS Solution Reference Network Design Guide at the following URL:
http://www.cisco.com/en/US/docs/solutions/Enterprise/WAN_and_MAN/QoS_SRND/QoS-SRND-Book.html.The following sections describe how to configure the devices at the edge of the network, indicating the various configuration steps that are required on various platforms for both of these approaches.
Wired Clients
Traffic originated from wired clients is received on the access layer switches deployed in each campus building block or at each branch location. Classification and marking should be applied to these devices, as shown in Figure 25.
Figure 25 Classifying and Marking Traffic for Wired Clients
The most granular policing can be achieved by using per-port/per-VLAN policers that are supported on the Catalyst 2970, 3560, 3750, and 4500. Using per-port/per-VLAN policing has the following advantages:
•
It defines a generic policy that is portable and that can be seamlessly applied across various access layer devices.
•
It applies the policy to a physical port, and the policy is effective only when that port is deployed in the specified VLAN. This is important in a design where the same switch port is dynamically assigned to a different VLAN, based on the identity of the connected user.
For Catalyst 6500 switches, a different approach is used, given the lack of per-port/per-VLAN policing. See Catalyst 6500 for more information.
Note
When deploying the static approach for wired clients, the recommended design consists of creating a two-tiered policy. At the access layer, traffic is rate-limited per port (per user) up to a certain threshold. At the distribution layer, an ingress policer is configured on the trunk ports connecting to the access layer devices, so that the aggregate traffic can be rate limited. The required configuration commands to create this two-tiered policy depend on the specific platforms deployed at the access and distribution layers.
Catalyst 2970, 3560, and 3750
Per-port/per-VLAN policing requires Cisco IOS Release 12.2(25)SE or later. To enable this functionality, you must use hierarchical policy maps. The required configuration steps follow. Most of the commands are common for both the static and dynamic approaches. The only difference is in the creation of the interface-level policy map.
Step 1
Enable QoS globally:
3560-Access(config)#mls qosStep 2
Enable VLAN-based QoS on the switch port.
By default, VLAN-based QoS is disabled on all physical switch ports. The switch applies QoS, including class maps and policy maps, only on a physical port basis. In Cisco IOS Release 12.2(25)SE or later, you can enable VLAN-based QoS on a switch port. This procedure is required on physical ports that are specified in the interface level of a hierarchical policy map on an SVI (defined in the next step).
3560-Access(config)#int f0/173560-Access(config-if)#mls qos vlan-basedStep 3
Configure hierarchical policing.
Hierarchical policing combines VLAN and interface level policy maps to create a single policy map. On an SVI, the VLAN-level policy map specifies on which traffic class to act. Actions can include trusting the class of service (CoS), DSCP, or IP precedence values, or setting a specific DSCP or IP precedence value in the traffic class. The following steps are required for marking the traffic originated in a generic edge VLAN, accordingly to the strategy previously described.
a.
Create a VLAN-level class map. Note that the ACL is generically defined to match all the IP traffic. This is the key for the ACL portability previously mentioned.
3560-Access(config)#access-list 101 permit ip any any3560-Access(config)#class-map match-all EDGE-VLAN3560-Access(config-cmap)#match access-group 101b.
Create an interface-level class map to specify the physical switch ports that are affected by the policer.
3560-Access(config)#class-map match-all EDGE-INTF3560-Access(config-cmap)#match input-interface f0/1 - f0/48
Note
You can specify all the switch ports in the match input-interface command. The policer works on a given switch port only if it is part of the specified VLAN.
c.
Create an interface-level policy map to define the action to take on traffic received on each port.
–
Static approach
Traffic exceeding a specified threshold is dropped. Traffic below the threshold is marked as best effort and is transmitted.
3560-Access(config)#policy-map EDGE-INTF-POLICY3560-Access(config-pmap)#class EDGE-INTF3560-Access(config-pmap-c)#set dscp default3560-Access(config-pmap-c)#police 1000000 8000 exceed-action drop–
Dynamic approach
In this case, all the traffic that exceeds a specified threshold (1 Mbps in the example) is marked as scavenger traffic, but is not dropped.
3560-Access(config)#mls qos map policed-dscp 0 to 83560-Access(config)#policy-map EDGE-INTF-POLICY3560-Access(config-pmap)#class EDGE-INTF3560-Access(config-pmap-c)#police 1000000 8000 exceed-action policed-dscp-transmitd.
Create the VLAN-level policy map:
3560-Access(config-pmap)#policy-map EDGE-VLAN-POLICY3560-Access(config-pmap)#class EDGE-VLAN3560-Access(config-pmap-c)#set dscp default3560-Access(config-pmap-c)#service-policy EDGE-INTF-POLICYe.
Apply the previously define policy map to the SVI. This is the key step to ensure that the policer is effective on switch ports belonging to this VLAN (and only on these).
3560-Access(config)#interface vlan 213560-Access(config-if)#service-policy input EDGE-VLAN-POLICY
Catalyst 4500
The configuration of per-port/per-VLAN policing on Catalyst 4500 platforms is more straightforward than for the Catalyst 2970, 3560, and 3750, because it does not require the definition of hierarchical policy maps. To support this functionality, Cisco IOS Release 12.2(25)EWA or later is required (for Sup2+ to Sup V). The required configuration steps follow. Most of the commands are common for both the static and dynamic approaches, The only difference is in the creation of the policy map.
Step 1
Create a class map to identify the traffic:
4500-Access(config)#access-list 101 permit ip any any4500-Access(config)#class-map match-all EDGE-VLAN4500-Access(config-cmap)#match access-group 101Step 2
Define the policy map to mark the traffic.
•
Static approach
Traffic exceeding a specified threshold should be dropped, whereas traffic below the threshold is marked as best effort and is transmitted.
4500-Access(config)#policy-map EDGE-VLAN-POLICY4500-Access(config-pmap)#class EDGE-VLAN4500-Access(config-pmap-c)#set ip dscp 04500-Access(config-pmap-c)#police 1000000 8000 exceed-action drop•
Dynamic approach
In this case, all traffic exceeding a specified threshold (1 Mbps in the example) is marked as scavenger traffic but is not dropped.
4500-Access(config)#qos map dscp policed 0 to dscp 84500-Access(config)#policy-map EDGE-VLAN-POLICY4500-Access(config-pmap)#class EDGE-VLAN4500-Access(config-pmap-c)#set ip dscp 04500-Access(config-pmap-c)#police 1000000 8000 exceed-action policed-dscp-transmitStep 3
Apply the policy map.
Note that this is done on a per-VLAN basis on each physical interface. This means that the policy is in effect only when the port is configured as part of that VLAN (or if it is a trunk carrying that VLAN).
cr24-4503-1(config)#int g2/1cr24-4503-1(config-if)#vlan-range 11cr24-4503-1(config-if-vlan-range)#service-policy input EDGE-VLAN-POLICY
Catalyst 6500
The Catalyst 6500 is the most powerful and flexible Cisco switching platform. As such, it can be found in all three layers of a campus network (access, distribution, and core). When configured as an access layer switch, traditionally the software running on the Supervisor is CatOS. When configured as a distribution or core layer switch, the recommended software is Cisco IOS. This distinction has changed since the introduction of the Sup32, which can run both CatOS and IOS code and is usually positioned as an access layer device. See Catalyst 6500 with Cisco IOS for the Cisco IOS configuration.
Note
In this section, only Catalyst 6500 Supervisors equipped with Policy Feature Card 2 (PFC2) or PFC3 are taken into consideration. This categorization includes Sup2 (PFC2) and Sup32/Sup720 (PFC3), but not older Supervisor models (Sup1/Sup1a).
Catalyst 6500 with CatOS
Per-VLAN policers are supported in CatOS. However, this type of policer should not be confused with the per-port/per-VLAN policers described in the previous sections for the other Catalyst platforms.
A per-VLAN policer can police all flows within a given VLAN, as an aggregate sum of the traffic of all ports belonging to a given VLAN. A per-port/per-VLAN policer can discretely police flows from a given VLAN on a per-port basis, which is much more granular than other policing methods. Because the purpose of the design described here is to classify and mark the traffic received on each switch port, the aggregate per-VLAN policer is not used in this example; a port-based QoS is configured instead.
The required configuration steps follow. Most of the commands are common for both the static and dynamic approaches. The only difference is in the definition of the aggregate policer.
Step 1
Define the aggregate policer to be used for the edge traffic.
When configuring per-port policers in CatOS, a default behavior to keep in mind is that, in CatOS, ACLs and aggregate policers cannot be applied to more than one port at the same time. For example, if an aggregate policer called POLICE-EDGE is defined to rate-limit flows to 1 Mbps, and this policer is applied to two separate ports in CatOS, it rate-limits flows from both ports to a combined total of 1 Mbps, instead of the intended behavior of limiting flows to 1 Mbps on a per-port basis (as is the case if configured in Cisco IOS). To work around this default behavior, ACLs and aggregate policers have to be uniquely defined on a per-port basis.
•
Static approach
Traffic exceeding a specified threshold is dropped, whereas traffic below the threshold is marked as best effort and is transmitted.
6500-access> (enable) set qos policer aggregate EDGE-PORT-2-1 rate 1000 burst 8000 drop•
Dynamic approach
In this case, all the traffic exceeding a specified threshold (1 Mbps in the example) is marked as scavenger traffic but is not dropped.
6500-access> (enable) set qos policed-dscp-map 0:86500-access> (enable) set qos policer aggregate EDGE-PORT-2-1 rate 1000 burst 8000 policed-dscp"Bind an ACL to the policer to mark in-profile traffic as Best Effort (DSCP 0).6500-access> (enable) set qos acl ip EDGE-ACL-2-1 dscp 0aggregate EDGE-PORT-2-1 ip 10.124.10.0 0.0.0.255 any
Note
Because the policy is applied to the physical switch ports, you need to take into consideration the fact that the same port can be used by different categories of users. For this reason, you need to define a more specific ACL to select the IP subnets from where the traffic originates. As a result, you lose the advantage of having a generic template seamlessly valid on different edge devices (which is possible when using the per-port/per-VLAN functionality, as previously described).
Step 2
Commit the ACL to PFC hardware:
6500-access> (enable) commit qos acl EDGE-ACL-2-1Step 3
Attach the ACL to the corresponding switch port:
6500-access> (enable) set qos acl map EDGE-ACL-2-1 2/1
Catalyst 6500 with Cisco IOS
Hardware advancements in the PFC3 provide a number of new features, such as User-Based Rate Limiting (UBRL). UBRL is a form of microflow policing that provides rate-limited traffic flows and, unlike a normal microflow policer, it allows a policer to be applied to all traffic to or from a specific user.
In this section, UBRL is used to classify and mark the edge traffic. Each flow is examined by its source IP address and if a source is transmitting out-of-profile, the excess traffic can be dropped or marked as scavenger traffic (CS1 or DSCP 8), depending on the adopted approach.
The definition of a flow is determined by the flow mask; the flow mask is what defines a flow. The flow mask identifies fields in the packet header that are used to perform a lookup in the NetFlow table. In this case, use the source-only flow mask. The PFC maintains one entry for each source IP address, so that all flows from the given source IP address use this entry.
The configuration steps follow. Most of the commands are common for both the static and dynamic approaches. The only difference is in the definition of the policy map.
Step 1
Define the class-map to identify the edge traffic:
6500-access(config)#access-list 101 permit ip 172.16.11.0 0.0.0.255 any6500-access(config)#class-map match-all EDGE6500-access(config-cmap)#match access-group 101Step 2
Define the policy map. It is important to specify mask src-only in the police flow command to police all the traffic sent by each specific user. To do that, configure a null flow mask for NDE (NetFlow) using the no mls flow ip command (this is the default value for Sup720/Sup32).
•
Static approach
Traffic exceeding a specified threshold is dropped, whereas traffic below the threshold is marked as best effort and is transmitted.
6500-access(config-cmap)#policy-map EDGE-POLICING6500-access(config-pmap-c)#class EDGE6500-access(config-pmap-c)#set dscp default6500-access(config-pmap-c)#police flow mask src-only 1000000 8000 conform-action transmit exceed-action drop•
Dynamic approach
In this case, all the traffic exceeding a specified threshold (1 Mbps in the example) is marked as scavenger traffic but is not dropped.
6500-access(config)#mls qos map policed-dscp normal 0 to 86500-access(config-cmap)#policy-map EDGE-POLICING6500-access(config-pmap-c)#class EDGE6500-access(config-pmap-c)#set dscp default6500-access(config-pmap-c)#police flow mask src-only 1000000 8000 conform-action transmit exceed-action policed-dscp-transmitStep 3
Attach the policy map to the physical interfaces:
6500-access(config)#interface GigabitEthernet1/16500-access(config-if-range)#service-policy input EDGE-POLICING
Note
In cases where the policy map is attached to a VLAN interface instead of to a physical port, you must also use the mls qos vlan-based command on the switch port (belonging to that specific VLAN) where the traffic is received, as shown in the following example.
6500-access(config)#interface GigabitEthernet1/146500-access(config-if)#sw acc vlan 1006500-access(config-if)#mls qos vlan-based6500-access(config-if)#interface vlan 1006500-access(config-if)#service-policy input EDGE-POLICINGWhenever an inbound policy map is applied to a physical or logical interface of a Catalyst 6500 with PFC3, the DSCP is set on the ASIC of the egress line card before sending out the packet. This has an important consequence when the traffic needs to be sent on a tunnel interface (see Figure 26).
Figure 26 Applying a Policy Map Before Tunneling Traffic
Because of this hardware functionality, the DCSP field is set correctly in the outer IP header but not in the original IP header. This needs to be taken into consideration when the traffic is decapsulated on the switch terminating the GRE tunnel because, at that point, the marking information is no longer available.
Note
This problem does not exist when traffic is not encapsulated because, in that case, only one IP header is present.
Wireless Clients
Marking strategies for traffic originating from wireless clients vary with the specific wireless deployment and with the network location (campus or branch). The same marking strategies described in the previous sections can also be applied for wireless deployments. The main difference is that now marking cannot be done on a user basis (as is done in the wired case using the per-port/per-VLAN functionality), but is done more on an aggregate basis, as described in the following sections.
As previously described for a wired scenario, a static and a dynamic QoS approach also applies for wireless deployments.
Traditional Aironet
When deploying standalone access points at the edge of the network, the traffic originating from wireless clients is locally bridged to a VLAN defined on the access layer and distribution layer network devices. This situation is identical to the wired case previously described, so the classification and marking strategies described in the previous sections can be implemented on the access layer port where the access points are connected. This is valid for both campus and branch deployments, as shown in Figure 27.
Figure 27 Classifying and Marking Traffic in a Traditional Wireless Deployment
WLSM
In a wireless deployment using WLSM, the traffic is GRE-encapsulated on the access points distributed at the edge of the network and is then conveyed to a central location where the WLSM is located (in this example, this is in the enterprise data center). As a result, there are two kinds of traffic to consider: GRE traffic originated on the edge access points and directed to the Catalyst 6500 equipped with WLSM, and decapsulated traffic entering the wired portion of the network at the same Catalyst 6500 switch.
Static Approach
As previously mentioned, when deploying the static approach, the idea is to strictly rate-limit the traffic at the edge of the network. Traffic exceeding the predefined threshold is dropped and is not allowed further into the network. As a result, even for WLSM deployments, Cisco recommends performing ingress policing on the access layer switches, as shown in Figure 28.
Figure 28 Rate-Limiting Traffic on the Access Layer Device
Note that the per-port/per-VLAN functionality does not help much in this case because all the GRE traffic is sent out on the same VLAN (access point management VLAN) regardless of to which SSID (user group) the clients belong. To statically rate-limit the traffic for a specific user group, you must configure an ACL matching the destination address of the GRE tunnel that originated on the AP and associate it to the corresponding SSID. This still allows for the creation of a generic ACL that can be applied across different access layer devices.
Following is a sample configuration that is valid for a Catalyst 3560, and easily extendable to other Catalyst platforms:
Step 1
Define the class map to identify the edge traffic:
3560-access(config)#access-list 110 permit gre any host 10.121.253.2543560-access(config)#class-map match-all EDGE-GRE3560-access(config-cmap)#match access-group 110Step 2
Define the policer:
3560-access(config)#policy-map EDGE-GRE-POLICY3560-access(config-pmap)#class EDGE-GRE3560-access(config-pmap-c)#police 1000000 8000 exceed-action drop"Apply the policer on the switch interfaces3560-access(config)#interface FastEthernet0/343560-access(config-if)#service-policy input EDGE-GRE-POLICYDynamic Approach
Marking of the decapsulated traffic at the centralized location is the recommended choice when deploying a dynamic approach. This is done on a Catalyst 6500 equipped with Sup720; UBRL is the logical choice. The policer can be applied on the mGRE interface receiving all the edge traffic, to apply the marking before sending it into the core, as shown in Figure 29.
Figure 29 Policing Applied on the mGRE Interface at the Central Location
The required configuration steps are as follows.
Step 1
Define the class map to identify the edge traffic:
6500-DC(config)#access-list 101 permit ip any any6500-DC(config)#class-map match-all EDGE6500-DC(config-cmap)#match access-group 101Step 2
Define the policer to be applied on the mGRE interface. Mark all traffic that exceeds the specified threshold (1 Mbps in the example) as scavenger traffic (not dropped).
6500-DC(config)#mls qos map policed-dscp normal 0 to 86500-DC(config)#policy-map EDGE-POLICING6500-DC(config-pmap)#class EDGE6500-DC(config-pmap-c)#police flow mask src-only 1000000 8000 conform-action set-dscp-transmit 0 exceed-action policed-dscp-transmitStep 3
Apply the policer:
6500-DC(config)#interface Tunnel 106500-DC(config-if-range)#service-policy input EDGE-POLICINGWhen applying an inbound policy map on the mGRE logical interface, the same considerations proposed in Catalyst 6500 with Cisco IOS are still valid. If the traffic is eventually GRE-encapsulated before being sent out, only the outer IP header has the DSCP field marked correctly.
Because the GRE traffic originated on the distributed access points, it must be sent across the campus core to get aggregated on the Catalyst 6500 equipped with WLSM. Optionally, you can mark it on the access layer device where the access point is connected.
As mentioned in the section covering the static approach, you cannot use the per-port/per-VLAN functionality, so you must configure an ACL matching the destination address of the GRE tunnel originated on the access point and associate it to the user SSID. Following is a sample configuration that is valid for a Catalyst 3560, and easily extendable to other Catalyst platforms:
Step 4
Define the class map to identify the edge traffic:
3560-access(config)#access-list 110 permit gre any host 10.121.253.2543560-access(config)#class-map match-all EDGE-GRE3560-access(config-cmap)#match access-group 110Step 5
Define the policer. Mark all traffic that exceeds the specified threshold (1 Mbps in the example) as scavenger traffic (not dropped):
3560-access(config)#mls qos map policed-dscp normal 0 to 83560-access(config)#policy-map EDGE-GRE-POLICY3560-access(config-pmap)#class EDGE-GRE3560-access(config-pmap-c)#set dscp default3560-access(config-pmap-c)#police 1000000 8000 exceed-action policed-dscp-transmitStep 6
Apply the policer on the switch interfaces:
3560-access(config)#interface FastEthernet0/343560-access(config-if)#service-policy input EDGE-GRE-POLICY
WLAN Controller
Deploying WLAN controllers in the campus network implies that all traffic is tunneled from the edge access points to the controllers that can be deployed, for example, in a centralized location such as the campus data center. This behavior is very similar to the WLSM-based scenario described previously. The main differences are that traffic is tunneled using Lightweight Access Point Protocol (LWAPP) (and not GRE), and that the configuration of all the access points is performed centrally from the controller.
Static Approach
Differently from WLSM deployments, in this case keep in mind that the same LWAPP tunnel is used to carry data traffic for users belonging to different groups (usually associated using different SSIDs). As a result, it is not possible to classify the traffic for a specific user group on the access layer switch where the access point is connected. The only option is then to classify and rate limit it when it is bridged on the corresponding VLAN at the WLAN controller location. The platform where this is accomplished can vary, but is most likely a Catalyst 6500 when deploying the WLAN controllers in a centralized location (such as a data center), or when using the WLSM.
Following is a sample configuration that is valid for a Catalyst 6500:
Step 1
Define the class map to identify the edge traffic:
6500-DC(config)#access-list 101 permit ip any 10.124.150.0 0.0.0.2556500-DC(config)#class-map match-all EDGE-TRAFFIC6500-DC(config-cmap)#match access-group 101Step 2
Define the policer:
6500-DC(config)#policy-map EDGE-TRAFFIC-POLICING6500-DC(config-pmap)#class EDGE-TRAFFIC6500-DC(config-pmap-c)#police flow mask dest-only 1000000 8000 conform-action set-dscp-transmit 0 exceed-action dropStep 3
Apply the policer on the switch VLAN interface:
6500-DC(config)#interface Vlan 1506500-DC(config-if-range)#service-policy input EDGE-TRAFFIC-POLICINGDynamic Approach
Once again, the dynamic approach consists in marking out-of-profile traffic as scavenger traffic. Following decapsulation, the traffic is bridged to a unique VLAN that is associated to the WLAN, so Cisco recommends that you mark the traffic on the switch to which the controller is connected, as shown in Figure 30.
Figure 30 Marking LWAPP-Decapsulated Traffic
Depending on the specific platform to which the controller is connected, you can perform the same type of marking strategy that is described in Figure 30.
Additionally, when a WLAN is created on the controller, it is possible to associate a QoS level to it (see Figure 31).
Figure 31 Selecting a QoS Level for a WLAN
Depending on the level selected, the access point marks the DSCP for upstream traffic. The DSCP is set in the external IP header (traffic is LWAPP-encapsulated), as shown in Table 3.
With the adoption of the Cisco QoS Baseline (starting in 2002), Cisco does not recommend using terms such as platinum, gold, silver, and bronze to describe QoS classes, because such terms do not accurately convey the service level requirements of the applications within the classes. Furthermore, such terms seem to convey an oversimplified and often inaccurate strict application hierarchy. The following is per the QoS baseline:
•
DSCP 46 is the default marking for a voice class.
•
DSCP 26 (also referred to as AF31, as defined in RFC 2597) is the default marking value for a locally defined mission-critical data class.
•
DSCP 0 is the default marking for the best effort class (per RFC 2474).
•
DSCP 10 (also referred to as AF11, as defined in RFC 2597) is the default marking value for a bulk data class.
As shown in Table 3, the bronze setting does not correspond to a scavenger value (CS1 or 8), but to bulk (10). As a result, there are the following two options:
•
Leave the default marking for LWAPP-encapsulated traffic and configure the queuing strategy on all the devices between the access point and the controller so that this type of traffic is handled in a similar manner as the scavenger class.
•
Mark the LWAPP-encapsulated traffic on the first access layer switch where the access point is connected, similar to what is suggested for GRE traffic in the WLSM scenario. If using this approach, the selection of the QoS level for the WLAN becomes meaningless because the traffic is marked anyway.
Note
As described in WLSM, the marking of traffic is optional. Cisco recommends marking the LWAPP-decapsulated traffic that is bridged by the WLAN controller on the corresponding VLAN. This should be always done, considering that any previous marking that applied to LWAPP traffic is lost when the traffic is decapsulated on the controller.
For branch deployments, there are the following two options when deploying WLAN controllers:
•
In the first option, a local WLAN controller is deployed at the branch location. In this case, the same considerations given for campus deployments can be followed.
•
In the second option, remote edge access points are deployed at the branch location to locally bridge the user traffic. In this case, the classification and marking of traffic can be accomplished in the same manner as the wired case.
Both these options are shown in Figure 32.
Figure 32 Classifying and Marking Traffic at the Branch for WLAN Controller Deployments
Challenges and Limitations Using VRF and GRE
As described in previous sections, it is clear that the use of VRF and GRE to build VPNs inside the campus network provides many advantages when compared with the distributed ACLs approach. These advantages include the support of overlapping address spaces between VPNs, the path differentiation capabilities offered by the use of a separate routing table per VPN, and the perception of the achievement of a safer solution.
However, the VRF and GRE solution should be implemented only in applications for which it is well-suited, because of the following limitations:
•
Operational complexity—As previously mentioned, building a VPN using VRF and GRE is well-suited for applications required hub-and-spoke connectivity. In scenarios where any-to-any connectivity must be achieved, the configuration task in building GRE tunnels connecting all the various sites of the network can quickly become unmanageable. The use of mGRE helps in simplifying the configuration, but it is minimized by the limited level of support on platforms normally deployed in campus networks.
•
Limited scalability and performances—As discussed in Connectivity Requirements, GRE is supported in hardware only on Catalyst 6500 switches equipped with Supervisor 32 or 720. As a result, the scalability and performance that can be achieved with this solution are tightly linked to the specific devices deployed in the network. Also, for designs where deployed platforms supporting GRE in software (such as Catalyst 4500 switches), additional precautions must be taken to protect the CPU of these devices from becoming over-used. The recommended way to achieve this is by rate limiting the traffic.
Path Isolation Deploying MPLS VPN
Multiprotocol Label Switching (MPLS) has traditionally been viewed as a service provider (SP) routing technology: SPs have commonly used MPLS VPN to create tunnels across their backbone networks for multiple customers. In that way, individual customer traffic is carried on a common service provider network infrastructure. Using the same principle, MPLS VPN can be deployed inside the enterprise network to logically isolate traffic between users belonging to separate groups (as for example guest, contractors, and employees) and to provide a technical answer to the business problems discussed at the beginning of this guide.
The main advantage of MPLS VPN when compared to other path isolation technologies is the capability of dynamically providing any-to-any connectivity without facing the challenges of managing many point-to-point connections (as for example is the case when using GRE tunnels). MPLS VPN facilitates full mesh of connectivity inside each provided segment (or logical partition) with the speed of provisioning and scalability found in no other protocol. In this way, MPLS VPN allows the consolidation of separate logical partitions into a common network infrastructure.
The following sections of this guide describe the steps required to enable MPLS VPN end-to-end across the enterprise network. The initial section presents a quick overview of the MPLS VPN technology; the assumption here is that the reader is already familiar with the technology, so the purpose of this specific section is simply to review how the technology works and what are the various technical components involved. After that, the focus shifts to deploying MPLS VPN in an enterprise campus environment: the goal here is to provide design guidance for applying MPLS VPN to the enterprise campus and analyze the impact that has on a campus network configured following the recommended and consolidated design. The design considerations are provided based on some initial assumptions that are discussed in Path Isolation Initial Design Considerations and that are reviewed in this section.
MPLS VPN Technology Overview
MPLS Rehearsal
As already mentioned in the previous section, MPLS was originally deployed for the service provider environment. This heritage becomes more evident when describing the various roles that the network devices perform in an MPLS-enabled network.
Figure 33 shows the three roles a device can play when deploying MPLS.
Figure 33 Device Roles in an MPLS Network
1. Customer edge (CE) router—This is traditionally the network device at the customer location that interfaces with the service provider. In Figure 33, CE1 and CE2 represent the routers at the customer remote locations that need to be interconnected via the MPLS service provider network.
2. Provider edge (PE) router—This is the device at the edge of the service provider network that interfaces with the customer devices. The PE devices are often also called label switching routers edge (LSR-Edge), because they sit at the edge of the MPLS-enabled network.
3. Provider (P) router—These are the devices building the core of the MPLS-enabled network. Their main functionality is to label switch traffic based on the most external MPLS tag imposed to each packet and for this reason are often referred to as label switching routers (LSRs)
From a control plane point of view, an MPLS-enabled network uses two separate protocols: first, an IGP running in the core of the network and providing connectivity between the various network devices. Second, a Label Distribution Protocol (LDP) providing a standard dynamic methodology for hop-by-hop label distribution in the MPLS network. LDP works by assigning labels to routes that have been chosen by the underlying IGP routing protocol. The resulting labelled paths, shown in Figure 34 and called label switched paths (LSPs), forward label traffic across an MPLS backbone to particular destinations.
Figure 34 MPLS Control Plane
From the point of view of data forwarding, traffic that needs to be sent between remote customer sites is label-switched along the LSP, as shown in Figure 35.
Figure 35 MPLS Data Plane
Each device along the LSP switches the traffic based on the incoming MPLS label; a new tag is imposed before the packet is sent to the next device. Notice that the behavior shown in Figure 35 may be in reality slightly different because of a functionality called Penultimate Hop Popping (PHP). By default the egress PE device explicitly informs the neighbor P not to tag packets directed to it, so that the PE can switch the packet based only on IP information without having to do a double lookup (first one for the MPLS tag, second one for the IP information). Figure 36 shown the same network above when using PHP.
Figure 36 Penultimate Hop Popping
The MPLS tag shown in Figure 36 is a 32 bit header that is structured as shown in Figure 37.
Figure 37 MPLS Label
The structure is as follows:
•
MPLS Label—20-bit field used for label switching the packet and is replaced at every hop in the MPLS network
•
EXP—3-bit field that is used to indicate the class of service (CoS) of the MPLS packet (similarly to the CoS field in Ethernet frames)
•
S—Bit used to indicate the bottom of the stack when more than one MPLS label is imposed on the packet (as seen subsequently in the case in the MPLS VPN scenario)
•
TTL—8-bit time-to-live value (having the same functions of loop detections as the homonymous IP field)
The MPLS label is placed after the Layer 2 headers for a packet. Notice that a packet can have multiple MPLS labels appended to it; this is referred to as the label stack. Each MPLS label has a specific meaning for the node that pushed the label onto the packet, and the node that pops that label from the stack. The LSR routers in the network forward packets only based on the outer most label. The lower labels are taken into account only when they become the outermost label after the previous outermost label has been popped. MPLS labels are pushed onto packets starting with the original frame, and additional labels are added on top of the outer most label. MPLS labels are popped starting with the outer most label, the last one pushed onto the label stack. (See Figure 38.)
Figure 38 MPLS Label Stack
Another important concept widely used when discussing MPLS is the forwarding equivalence class (FEC). An FEC is a set of packets that all meet some defined criteria, and are forwarded in the same way by a router. The packets can differ from each other from the information carried in the network layer (source, destination addresses, and ToS) but are forwarded using the same rule. An example of an FEC is all unicast packets destined to a particular prefix. They can have different destination addresses but the destination addresses all fall under the same prefix. The forwarding entry that a router maintains for a packet contains the classification criteria (normally destination address) and the next hop address. Packets that fall into an FEC associated with a particular forwarding entry are forwarded to the next hop router specified by the entry. Note that an FEC in the world of IPv4 routing is nothing more than a prefix in the routing database; this essentially implies that a separate LSP is built for each individual routing database entry.
MPLS VPN Rehearsal
The discussion above applies to a scenario where the MPLS network is used to connect remote sites belonging to the same customer organization. For the SP to use the same MPLS core to provide connectivity services to different customers, as shown in Figure 39, something more than MPLS is needed, which is MPLS VPN.
Figure 39 MPLS VPN
The key technology that simplifies the deployment of MPLS VPN is VRF, which is discussed in Control Plane-Based Path Isolation. As shown in Figure 39, defining distinct VRF instances on each PE device allows separating the traffic belonging to different customers, allowing for logical isolation and independent transport across the common MPLS core of the network. Notice that the VRF definition is required only on the PE devices, whereas the P routers in the core of the network have no knowledge of VRFs; they simply label-switch traffic based on the most external MPLS label.
From a control plane perspective, an additional component now needs to be added to the IGP and LDP protocols previously discussed: Multi-Protocol BGP (MP-BGP), which is used as the mechanism to exchange VPN routes between PE devices. As shown in Figure 40, for this to work, an MP-iBGP session needs to be established between all the PE devices (in a fully meshed fashion).
Figure 40 Control Plane for MPLS VPN
From a control plane perspective, the following two important elements need to be defined to perform the exchange of VPN routes through MP-BGP:
•
Route distinguisher (RD)—Represents a 64-bit field (unique for each defined VRF) added to each 32-bit IPv4 address to come up with a unique 96-bit VPN IPv4 prefix. This ensures the uniqueness of address prefixes across different VPNs, allowing support for overlapping IPv4 addresses.
•
Route target—Represents an extended attribute exchanged through MP-BGP and allows the PE devices to know which routes need to be inserted into which VRF. Every VPN route is tagged with one or more route targets when it is exported from a VRF (to be offered to other VRFs). It is also possible to associate a set of route targets with a VRF, so that all the routes tagged with at least one of those route targets are inserted into the VRF.
From a data plane perspective, the packets belonging to each VPN are labeled with two tags: the internal tag uniquely identifies the specific VPN the packets belong to, whereas the external tag is used to label-switch the traffic along the LSP connecting the ingress PE toward the egress PE. This concept is highlighted in Figure 41.
Figure 41 Data Plane for MPLS VPN
As shown in Figure 41, when the IP packet is received at the ingress PE, a first VPN label is imposed on it. The information on what VPN label to apply has been received from the egress PE via MP-iBGP. Before sending the packet to the MPLS core, the ingress PE must also impose a second tag (the most external one), which is used to label switch the packet along the LSP connecting the ingress PE to the egress PE. When the egress PE receives the packet, it is able to look at the VPN label and based on that specific label, send the traffic in the proper VPN.
Finally, the last element that needs to be considered for an MPLS VPN deployment is the route reflector (RR). Because MP-iBGP sessions need to be established between the different PEs defined at the edge of the MPLS network, Cisco usually recommends not deploying a full mesh of iBGP connections but instead using several devices as route reflector routers.
Figure 42 Deployment of Route Reflectors
Each route reflector peers with every PE device (in a hub-and-spoke fashion), contributing to the overall stability of the design. Also, deploying route reflectors eases the addition of new sites, because only a new peering with the route reflector needs to be established without modifying the configuration of the remaining PE devices. The following paragraph highlights the advantages of deploying route reflectors both in a campus and WAN environments.
MPLS VPN in Campus
High Level Design Principles
Current campus networks must address a new set of customer requirements, such as the desire for mobility, the drive for heightened security, and the need to accurately identify and segment users, devices, and networks. All these drivers are leading enterprises to revisit their campus design requirements.
The Cisco-recommended design for the campus network is architected in a hierarchical model comprised of core, distribution, and access that provide distinct features and functionalities. Multi-tier designs using Layer 2 in distribution and access enable the design of modular topologies using scalable "building blocks" that allow the network to meet evolving business needs. The multi-tier model based on modular design is easy to scale, understand, and troubleshoot because it follows a deterministic traffic pattern.
An in-depth discussion of Cisco-recommended campus network design is out of the scope of this guide. For more information on this topic, see the following URL: http://www.cisco.com/en/US/netsol/ns815/networking_solutions_program_home.html.
When deploying MPLS VPN in a campus environment, keep in mind the following two key points:
•
The assumption is that the campus network should be always deployed following the recommended design principles highlighted in the documents referenced above.
•
Understanding what modifications (or simplifications) need to be applied to an SP-based technology to fit within the enterprise, while trying to maintain the campus MPLS deployments as simple and straightforward as possible. This means that deploying network virtualization should not impact "what is already working" in the network. In addition, even inside each logical partition, the user should experience the same characteristics of scalability, hierarchy, stability, and so on, as if the user was part of a dedicated physical infrastructure.
The general considerations made in Path Isolation Initial Design Considerations are also valid also when deploying MPLS VPN as a path isolation option; it is thus recommended to read that specific section to properly frame the solution. In addition to that, some important design principles or differences between an enterprise MPLS-VPN and an SP deployment need to be kept in mind when specifically deploying MPLS VPN in a campus environment. The following assumptions are also uniquely characterizing these deployments from the traditional service provider ones.
•
New design principles related to the IGP deployment now need to be kept in mind:
–
The IGP used in the global table runs edge-to-edge across the enterprise network, differently from an SP-like MPLS VPN deployment, where usually it is confined in the core.
–
There are no longer customer IGPs running at the edge of the network whose routes are tunneled across the backbone. This is always true if the PE devices are deployed at the campus distribution layer and the access layer provides only Layer 2 functions (multi-tier design). In routed access designs, the access switches may play the multi-VRF CE role, so an IGP may be required in the context of each defined VRF to exchange routes between the access and distribution devices (these IGP instances can be considered the equivalent of "customer IGPs" found in SP deployments).
•
The IGP used in the global table has a double functionality: on one side, it allows the establishing of MP-iBGP sessions between the PE devices deployed at the edge of the MPLS domain and to exchange MPLS labels through a specific LDP protocol. At the same time, it is also used to allow network connectivity to the entities that remain in the global table. As already mentioned, the current recommendation is to use virtual networks only for specific purposes. This means that most of the internal enterprise traffic still remains switched in the global table. This represents a first differentiation from the SP-like MPLS VPN deployment, because in that case the global table is usually used to provide only PE-PE connectivity and does not extend to the edge of the network but only remains in the core.
•
The solution discussed here constitutes an evolutionary or overlay design. The goal of this design is to use MPLS VPN to provide additional services within an existing network to complement rather than replace the existing campus network.
•
The MP-BGP process represents the control plane that allows the establishment of forwarding paths for VPN traffic and is used in addition to the IGP that perform the same functionality for IPv4 global traffic. As a consequence, a single AS scenario is discussed in this phase of the project: this implies that the routing protocol in global table (IGP) extends end-to-end in the enterprise network (campuses, data centers, and remote offices). MP-BGP is thus overlaid on top of the IGP running in the global table.
•
Enterprise design requires end-to-end operational support processes. The division between PE and CE devices exists now technically but not operationally, because both are now part of the same enterprise network and as such are most likely administered by the same group. Also, it is worth noting that in many cases, there is no CE device or role in the design either, because when deploying MPLS VPN in multi-tier campus networks, all the edge VPN subnets results directly connected to the PE devices. As previously mentioned, multi-VRF CEs may be deployed in routed access scenarios.
Network Topologies
One of the main goals of this guide is to determine the impact of turning on MPLS VPN in a working campus network environment deployed based on the hierarchical design recommendations. Figure 43 shows an example of a hierarchical campus network. The various campus distribution blocks are connected by a high speed core. The assumption here is that these connections are point-to-point links and that the enterprise has control of all the devices building the high speed core.
Figure 43 Hierarchical Campus Network
Starting from the general campus model shown in Figure 43, three main topologies are analyzed in the following sections, as represented in Figure 44:
•
Fully-meshed topologies
•
Partially-meshed topologies
•
Ring topologies
Figure 44 Campus Network Topologies
These represent three common topologies that are often deployed. Although it is always recommended that the core network design implement a full mesh topology whenever possible, it is relevant to note that there is often not the possibility of connecting the core devices in a fully-meshed fashion because of cost or geographical location issues. In such hybrid scenarios, each building block would be fully meshed to the core devices, and the core devices would be linked in a ring fashion. The campus fully-meshed design is traditionally the recommended one for its characteristics of convergence, reliability, and traffic load balancing. This recommendation holds true when deploying MPLS VPN in the campus environment. However, because following this guideline may not always be possible in real network deployments, the following sections highlight possible issues to keep in mind when deviating from the ideal fully-meshed scenario.
Network Device Roles
As discussed in MPLS VPN Technology Overview, when deploying MPLS VPN, there are essentially four roles that the device can play in the design: CE, PE, P, and route reflectors (RRs).
In a traditional multi-tier campus design, the access layer devices are Layer 2 capable and the first Layer 3 hop in the network is at the distribution layer. Core nodes are Layer 3 routed devices interconnecting various campus distribution blocks.
When deploying MPLS VPN as an overlay model in such campus environment, the recommended roles and positioning for the network devices involved in the deployment are shown in Figure 45.
Figure 45 Device Roles in an MPLS Network
As shown in Figure 45, the PE devices are positioned at the first Layer 3 hop in the network, which is the distribution layer. VRFs must in fact be defined at the first Layer 3 hop device, to extend at Layer 3 the logical isolation provided by VLANs at Layer 2. As a consequence, the recommendation is to deploy there a platform supporting VRF capabilities and capable of performing MPLS label-switching functionalities.
Note
In designs where the platforms deployed at the distribution layer are not MPLS capable, the use of some other technique (such as VRF-lite) is required to extend the VRF isolation to a PE device deployed in the core. Discussing this model is out of the scope of this guide, so the assumption here is that MPLS-capable devices are deployed in the distribution layer of the campus network.
Deploying PE functionalities at the distribution layer implies that all the other devices constituting the high speed core of the network play the P role. Note how in the specific design shown in Figure 45, there are actually no true CE devices, because the only entities connecting to the PE (except for the P switches) are access layer switches that perform only Layer 2 functionalities. Finally, Cisco recommends using two additional routers as RRs, connecting them to the core devices.
Note
RR deployment is further discussed in MP-iBGP Deployment Considerations.
VRF and MPLS on Catalyst 6500 Platforms
The only switching platform commonly deployed in campus networks currently supporting MPLS is the Catalyst 6500 equipped with Sup720 or Sup32 PFC3B or DFC3B (and higher). Having an understanding of the operation of label switching on this device helps in comprehending the design and how to better troubleshoot eventual issues discussed subsequently in MPLS-Specific Troubleshooting Tools.
Note
MPLS is supported only on 6500 platforms running Cisco IOS (Native) and not in a Hybrid (CatOS + IOS) system.
For a basic understanding of packet forwarding in the Catalyst 6500 architecture and for more information on terms such as PFC, DFC, and CEF, see the following URL: http://www.cisco.com/en/US/products/hw/switches/ps708/index.html.
Hardware Components Involved in MPLS Switching
To understand the various platform components involved in MPLS switching, it is necessary to distinguish between control and data planes, as shown in Figure 46.
Figure 46 High Level View of Control and Data Planes on Catalyst 6500
The routing protocols (usually OSPF and EIGRP in a campus environment) running in global table learn routes from the routing peers and install those routes into the routing database. After the routing database has been populated, the CEF process takes the information in the database and populates the forwarding table. This table is then programmed and pushed down to the DFC (if DFC-enabled line cards are present in the system) and the PFCs on the supervisors.
In addition to this, after MPLS is enabled on the device, there is an additional control plane represented by a label distribution protocol that can be thought as a routing protocol for MPLS, because it provides neighbor devices with information about MPLS labels. The label information received from the neighbors is loaded into the label database. Once again, the CEF process running on the SP takes that information and builds a second label database. Notice that this data structure contains v4 routes, v6 routes, and MPLS forwarding entries, and those MPLS forwarding entries basically form part of it.
The commands to view the contents of these databases on the SP and DFC3s are the same as the ones used on any Cisco IOS-based distributed forwarding platform. These commands, with the relative output, are as follows:
•
show mpls forwarding-table
cr20-6500-1#sh mpls forwarding-tableLocal Outgoing Prefix Bytes tag Outgoing Next Hoptag tag or VC or Tunnel Id switched interface16 Pop tag 192.168.100.19/32 0 Te1/1 10.122.5.3017 Pop tag 10.122.5.10/31 0 Te1/2 10.122.5.26Pop tag 10.122.5.10/31 0 Te1/1 10.122.5.3018 Pop tag 10.122.5.6/31 0 Te1/2 10.122.5.26<SNIP>•
show ip cef
cr20-6500-1#sh ip cefPrefix Next Hop Interface0.0.0.0/0 10.122.5.26 TenGigabitEthernet1/20.0.0.0/32 receive2.2.2.2/32 receive10.122.5.2/31 10.122.5.26 TenGigabitEthernet1/210.122.5.30 TenGigabitEthernet1/1<SNIP>To show the platform-specific hardware databases programming, use the following commands:
•
show mpls platform forwarding-table (issued on PFC3 / DFC3 modules)
•
show mls cef mpls
cr20-6500-1#sh mls cef mplsCodes: + - Push label, - - Pop Label * - Swap LabelIndex Local Label Out i/fLabel Op576 0 (EOS) (-) recirc608 100 (-) Vl355 , 0009.e845.4fff609 101 (-) Vl355 , 0009.e845.4fff610 97 (-) Te1/3 , 0009.448e.0e00611 98 (-) Vl305 , 0009.e845.4fff<SNIP>From a data plane perspective, the information in the label database is used to make that forwarding decision for outgoing MPLS packets.
Figure 47 shows the Catalyst 6500 hardware components.
Figure 47 Sup720 Architecture
There are RP and the SP processors on the MSFC3. The DRAM on the RP holds the routing and label databases. As previously discussed, the SP takes the information contained in these tables and programs the unified routing database on the PFC3. The PFC3 can be divided in two main components: Layer 3 and Layer 2 Engines. The Layer 3 Engine hosts the routing database and the adjacency table that holds rewrite information for each prefix contained in the database. Also, the Layer 3 Engine has two additional special pieces of memory, a VLAN RAM and an MPLS VPN RAM. Describing how the various label operations (PUSH, SWAP, and POP) are performed clarifies what roles each of these components need to play.
The Layer 2 Engine hosts a VPN Lookup table, which actually maps each MPLS label to an index that is used as a lookup key into the routing database. This is a key element when describing the POP operation for aggregate labels.
Note
The hardware architecture described here is valid for both Sup32 and Sup720 (the PFC on the Sup720 is identical to the one on the Sup32). However, note that the MPLS functionality is supported on supervisors equipped with PFC3B and higher.
LSR and LER Defined
Depending on the specific role that the Catalyst 6500 devices play in the MPLS network, there is a distinction between a label edge router (LER) and a label switch router (LSR). (See Figure 48.)
Figure 48 LER and LSR
Typically, the LER sits at the edge of the MPLS cloud at the boundary between the MPLS cloud and a non-MPLS network. Its functions are to add MPLS labels to the packet as it goes into an MPLS cloud (PUSH operation), or to strip those labels off when the packet leaves the MPLS cloud and goes into the non-MPLS network (POP operation). In Figure 48, the LER receives a packet destined to the subnet 172.168.1.0/24, performs the lookup in the routing database, and pushes a specific MPLS label (label 5) to the packet before sending it toward the neighbor LSR.
The LSR is responsible for making a forwarding decision based on the outer MPLS label contained in the packets received. Referring again to Figure 48, the LSR performs a lookup in the label database and determines that a packet received on the specific interface 1 with label 5 should be switched out interface 2 with a new label 7 (SWAP operation).
Note
Depending on the specific application enabled in the MPLS network (FRR, CsC, Traffic Engineering, and so on), LSR may also add labels as well, effectively creating tiers of a network hierarchy. These are usually unnecessary functions for solving the design problems in a campus MPLS VPN deployment and are not discussed further. For more information, see the following URL: http://www.cisco.com/en/US/products/ps6557/products_ios_technology_home.html.
Note
With MPLS terminology, in addition to LER and LSR, there is often reference to three additional acronyms: P, PE, and CE (see Figure 49). They are typically used when starting to deploy VPN services over the MPLS network, and are inherited from the service provider world.
Figure 49 CE, PE, and P Devices in MPLS VPN
Customer edge (CE) refers to a device that sits outside of an MPLS network (traditionally at the customer site). The provider edge (PE) device is akin to the LER, whereas the provider (P) device sits inside the MPLS cloud. MPLS VPN binds the VRF-lite technology with MPLS to provide virtualization capability, using MPLS labels to make the forwarding decisions. This basically means that now LERs have to "push" two MPLS labels on each IP packet entering the MPLS cloud: one internal label (called VPN label), and one external label (called IGP label). As previously mentioned, the deployment of MPLS VPN in multilayer campus networks is characterized by the absence of CE devices, and the PEs (LERs) sitting at the distribution layer impose two MPLS labels for traffic originated from directly connected networks belonging to specific VPNs.
The following sections discuss in more detail the specific operations the Catalyst 6500 hardware needs to perform in each of the phases described above, both for simple MPLS and MPLS VPN scenarios.
LER IPv4 Routing
IPv4 packets are forwarded across an MPLS network by the LER that is imposing labels. After the LER imposes the label, all nodes in the MPLS network forward the packet based on the top label. The label imposed on the IPv4 packet is based on IPv4 prefix. Figure 50 illustrates an LER receiving the packet and doing a lookup in the hardware tables (routing database and adjacency), and determining that label 40 is to be used to forward the packet. The LER transmits the packet with label 40 and the relevant Layer 2 headers for the media. The VPN ID in the CEF table is zero to indicate the global routing table.
Figure 50 LER IPv4 Routing
When acting as the ingress LER, the IPv4 packet is looked up like a regular IPv4 lookup. Because the ingress LER needs to start tagging the IP packets before sending them to the MPLS-enabled network, the adjacency entry for the IPv4 prefix needs to specify the label(s) to be imposed on the packet, as shown in Figure 51.
Figure 51 MPLS Adjacency Entry
Note
Only IPv4 unicast packets have MPLS labels imposed upon them; IPv4 multicast packets are sent unlabeled.
The LER device sitting at the egress edge of the MPLS cloud must remove all labels and perform an IPv4 forwarding decision on the packet (assuming it is not performing other functionalities not applicable in this design context, such as inter-AS or CsC function, in which case the behavior could involve leaving one or more labels on the packet). In most instances, the LSR device preceding the LER has popped the outermost label (PHP), and the LER receives the packet unlabeled. This is also the default behavior for Catalyst 6500 platforms, so the assumption is that the egress LER simply has to perform the forwarding decision based on the exposed IPv4 packet information.
LER IP VPN
RFC 2547 describes the implementation of Layer 3 VPNs using BGP to distribute the VPN information between LERs (PEs). The LERs are responsible for maintaining a separate routing table for each VPN. Packets are forwarded by looking up the prefix in the VPN forwarding table, and pushing the VPN label to identify the particular VPN and the IGP label that corresponds to the BGP next hop address for the destination LER.
RFC2547 defines any-to-any connectivity model inside each defined VPN. Each VPN has a unique CEF table on a PE device; this potentially allows for VPNs to have overlapping addresses. As shown in Figure 52, PE-1 determines that the packet is destined to PE-2 by looking up the VPN table, and pushes two labels upon the packet.
Figure 52 LER and LSR Operation
The first label pushed is a label to identify the specific VPN (VPN RED) for the PE-2. The label to be used was learned across the MP-iBGP session between PE-1 and PE-2. The second label pushed onto the packet is the IGP label to forward traffic to PE-2 along a dynamically-built LSP. By default, the last LSR connecting to PE-2 performs the PHP functionality, so PE-2 receives the packet with only the VPN label remaining. PE-2 pops the labels and performs an IP lookup on the backup to forward the packet to the destination (belonging to the proper VPN RED).
LER functionalities are performed on Catalyst 6500 platforms that are capable of hardware MPLS VPN traffic forwarding in two ways: ingress LER and egress LER.
Ingress LER
Figure 53 illustrates how the PFC3/DFC3 performs the forwarding decision for packets entering into a specific VPN. The packet is received on the interface and the headers are sent to the PFC3 to make the forwarding decision.
Figure 53 Ingress LER Operation
The following takes place:
•
The Catalyst 6500 Layer 3 Engine contains a table that maps VLANs to VPNs, called VLAN RAM. The packet ingresses a specific interface (Gig 1/1 in this example) that maps to an internally allocated VLAN 1101. Every Layer 3 interface in the system has a VLAN associated with it, either by configuration ("interface VLAN"), or by internal allocation ("interface Gigabit 1/1"). Sub-interfaces also have internal VLANs allocated.
By default, internal VLANs are assigned starting from the value 1006, as shown in the following example:
cr20-6500-1#sh vlan internal usageVLAN Usage---- --------------------392 GigabitEth ernet2/8.392402 GigabitEthernet2/8.4021006 online diag vlan01007 online diag vlan11008 online diag vlan2<SNIP>This implies that when trying to define a new Layer 2 user VLAN, a message can be displayed to indicate that the specific VLAN is not available because it has already been internally allocated, as shown in the following example:
cr20-6500-1(config)#vlan 1006cr20-6500-1(config-vlan)#name user_defined_VLANcr20-6500-1(config-vlan)#exit% Failed to create VLANs 1006VLAN(s) not available in Port Manager.To minimize this occurrence, the default behavior of the Catalyst 6500 can be changed with the command vlan internal allocation policy descending. This instructs the switch to allocate VLANs for internal usage starting from the highest value (4094) instead that from the lowest (1006), as in the following example:
cr20-6500-1#sh vlan internal usageVLAN Usage---- --------------------392 GigabitEthernet2/8.392402 GigabitEthernet2/8.402<SNIP>4092 online diag vlan24093 online diag vlan14094 online diag vlan0
Note
After entering the command above, a reload of the box is required for the new VLAN allocation to become effective.
•
The IP destination address is looked up in the CEF table but only against prefixes that are in the specific VPN; in the example, this is VPN number 5. The CEF table entry points to a specific set of adjacencies. One is chosen as part of the load balancing decision if multiple parallel paths exist (see Redundancy and Traffic Load Balancing for more details on multi-path scenarios).
•
The adjacency table contains the information on the Layer 2 header the packet needs, and the specific MPLS labels to be pushed onto the frame; in the example, these are labels 20 and 30. The adjacency table can push up to three labels without the need for re-circulation (two labels are required for the MPLS VPN deployment discussed in this guide). The information to rewrite the packet is sent back to the ingress line card, where it is rewritten by the port/fabric ASICs and forwarded to the egress line interface; in this example, g1/2.
All the information shown in Figure 53 can be accessed via the CLI of the Catalyst 6500. In the following example, the packet is received on an interface mapped to VRF "v1" and is destined to a remote VPN subnet 10.136.12.0. It is possible to immediately get the information on which interface the packet will be sent out and with which labels by using the following command that accesses the content of the hardware routing table:
cr20-6500-1#sh mls cef vrf v1 10.136.12.0Codes: decap - Decapsulation, + - Push LabelIndex Prefix Adjacency3466 10.136.12.0/24 Te1/1 313(+),57(+)The output above reveals that the packet is going to be sent out interface Te1/1 with two MPLS labels: an internal VPN label (313) that is used by the receiving PE to route the traffic to the right VRF, and the external label (57) that is used to label switch the traffic along the LSP connecting the ingress LER to the egress LER (this is also shown in Figure 53).
Note
The symbol "+" associated to the MPLS tag in the output above indicates that these labels are going to be pushed to the packet.
You can retrieve detailed hardware information for the same VPN destination prefix by using the following command:
cr20-6500-1#sh mls cef vrf v1 10.136.12.0 detailCodes: M - mask entry, V - value entry, A - adjacency index, P - priority bitD - full don't switch, m - load balancing modnumber, B - BGP Bucket selV0 - Vlan 0,C0 - don't comp bit 0,V1 - Vlan 1,C1 - don't comp bit 1RVTEN - RPF Vlan table enable, RVTSEL - RPF Vlan table selectFormat: IPV4_DA - (8 | xtag vpn pi cr recirc tos prefix)Format: IPV4_SA - (9 | xtag vpn pi cr recirc prefix)M(3466 ): E | 1 FFF 0 0 0 0 255.255.255.0V(3466 ): 8 | 1 256 0 0 0 0 10.136.12.0 (A:278534 ,P:1,D:0,m:0 ,B:0)Two important pieces of information can be retrieved from the output above:
•
The pointer to the adjacency table containing the rewriting information (A:278534)
•
The number of equal cost paths available to reach the destination prefix (P:1, which means there is only one path in this example)
Using the information above, you can then access the corresponding entry in the adjacency table, as follows:
cr20-6500-1#sh mls cef adjacency entry 278534 detailIndex: 278534 smac: 0012.da7c.c680, dmac: 0004.de1f.b000mtu: 1526, vlan: 1035, dindex: 0x0, l3rw_vld: 1format: MPLS, flags: 0x8418label0: 0, exp: 0, ovr: 0label1: 313, exp: 0, ovr: 0label2: 57, exp: 0, ovr: 0op: PUSH_LABEL2_LABEL1packets: 0, bytes: 0The output shows the rewrite information for the packet: source MAC, destination MAC, and the MPLS labels that are pushed to the packet (57 and 313). Also, the internal VLAN is reported (VLAN 1035), which maps directly to the interface that is used to forward the packet. It is already known that the interface used is Te1/1, and this is confirmed by displaying the mapping between internal VLANs and interfaces:
cr20-6500-1#sh vlan internal usageVLAN Usage---- --------------------1006 online diag vlan01007 online diag vlan1......................1035 TenGigabitEthernet1/11036 TenGigabitEthernet1/3..........................Egress LER
The way the PFC3/DFC3 handles VPN traffic on egress from the PE varies depending on whether per-prefix labels or aggregate labels are used. When per-prefix labels are used, each VPN prefix has a unique label association, which allows the PE to forward the packet to the final destination based on a label lookup in the routing database. If aggregate labels are used, the PFC3/DFC3 must perform an IP lookup to determine the final destination because many prefixes that can be on multiple interfaces are associated with the same label. Note that aggregate labels are assigned to each directly connected subnet, or every time a device performs route summarization.
It is important to note that when deploying MPLS VPN in a multilayer campus environment, positioning the PE at the distribution layer implies that all the VPN subnets result directly connected to the PE device. The PE then assigns a unique aggregate label to each defined VRF; this is to allow it to properly perform the lookup in the right routing table for all the VPN traffic received from the core of the network. In the following example, there is a specific PE assigning an unique aggregate label to each locally defined VRF (there are 25 VRFs in this case).
The implication of using aggregate labels is subsequently discussed in more detail.
Figure 54 illustrates the egress processing by PFC3/DFC3 when per-prefix labels are used.
Figure 54 Egress LER Operation with Per-Prefix Labels
The sequence of events that happen for performing the popping of a per-prefix label is the following:
1.
The packet enters the switch on a given interface (for which the switch assigns an internal VLAN number, 816 in this example). The MPLS label 30 present on the packet represents the VPN label, because by default the previous node in the network has performed PHP to remove the external IGP label.
2.
The packet headers are sent from the line card to the PFC3/DFC3 complex to perform the forwarding decision. The VPN label (30) does not match an entry in the VPN lookup table hosted in the Layer 2 Engine ("MISS" event). This is because, as discussed further below, the VPN lookup table is used only to store aggregate labels.
3.
As a consequence, the packet headers are sent to the Layer 3 Engine and a lookup is performed in the VLAN RAM table using the internal VLAN index associated to the port of the switch that received the packet (816 in this example). The lookup in the VLAN RAM determines that the packet belongs to the VRF identified by the VPN ID 822.
4.
This information is used to look up the MPLS label in the routing table (associated to the specific VPN ID). The appropriate adjacency is then chosen after performing the load balancing hash if multiple parallel paths exist. The adjacency contains the outbound interface (Gig 2/2) and Layer 2 headers and tells the system to POP the last label and to forward the packet to the next hop/ destination as an IP packet.
Figure 55 shows how the Catalyst 6500 performs the pop operation when the packet contains an aggregate MPLS label. As mentioned before, unique aggregate labels are assigned to each VRF defined on the PE device; aggregate labels are stored in the VPN lookup table, which is a table hosted on the Layer 2 Engine of the PFC3.
Figure 55 POP Operation with Aggregate Label
In Figure 55, Label 5, Label 8, and Label 22 are aggregate labels and are stored in the VPN lookup table table. The other information in the table associated to each aggregate label is the VPN ID that is used as part of the lookup key into the routing database. The important thing to consider here is that the VPN lookup table can host at most 512 entries. Allocating more than 512 aggregate labels on the PE device results in recirculation, thus reducing switching performance. Because a unique aggregate VPN label is associated to each VRF defined on the egress PE device, the number 512 represents the maximum number of VRFs that should be defined on a given PE to achieve optimal performance. This is rarely an issue in campus MPLS VPN deployments.
Note
One entry in the VPN lookup table is always reserved for the Explicit NULL label; therefore, the optimal performance is actually achieved with a maximum of 511 aggregate labels.
Information about the current usage of the VPN lookup table can be retrieved with the following CLI command:
cr20-6500-1#sh platform hardware capacity pfcL2 Forwarding ResourcesMAC Table usage: Module Collisions Total Used %Used1 0 65536 94 1%2 0 65536 105 1%5 0 65536 94 1%VPN CAM usage: Total Used %Used512 25 5%The example above refers to a PE that has allocated 25 aggregate labels for each distinct locally defined VRF, as follows:
cr20-6500-1#sh mpls forwarding-table | i Aggregate44 Aggregate vrf:v01 6065 Aggregate vrf:v02 6066 Aggregate vrf:v03 067 Aggregate vrf:v04 068 Aggregate vrf:v05 6069 Aggregate vrf:v06 070 Aggregate vrf:v07 071 Aggregate vrf:v08 072 Aggregate vrf:v09 073 Aggregate vrf:v10 074 Aggregate vrf:v11 075 Aggregate vrf:v12 076 Aggregate vrf:v13 077 Aggregate vrf:v14 078 Aggregate vrf:v15 079 Aggregate vrf:v16 080 Aggregate vrf:v17 081 Aggregate vrf:v18 082 Aggregate vrf:v19 083 Aggregate vrf:v20 084 Aggregate vrf:v21 085 Aggregate vrf:v22 086 Aggregate vrf:v23 6087 Aggregate vrf:v24 6088 Aggregate vrf:v25 0Depending on whether the number of aggregate labels is more or less than 512, the pop operation would happen in a different way. Figure 56 shows the scenario where the number of aggregate labels is less than 512.
Figure 56 POP Operation with Less than 512 Aggregate Labels
In Figure 56, the following sequence takes place:
1.
The packet is received on the egress LER with only the VPN label (the previous node in the network performed PHP to remove the IGP label).
2.
The packet headers are sent from the line card to the PFC3/DFC3 complex to perform the forwarding decision. The VPN label (313) matches an entry in the VPN lookup table and this allows for the Layer 2 Engine to determine the VPN ID (112) for the specific packet and to pop the VPN label. This allows the Layer 2 Engine to process the packet as an IP packet in a single pass without having to first pop the MPLS label and then re-circulate the packet to process it in the second pass as an IP packet.
3.
The result from the VPN lookup table is sent with the packet IP headers to the Layer 3 Engine. Note that the VLAN RAM table is not used to determine the VPN ID when a hit occurs in the VPN lookup table.
4.
The IP destination address (10.4.2.0) is looked up in the routing database against the routes for VPN 112. The appropriate entry in the adjacency table is then chosen after performing the load balancing hash if multiple parallel paths exist. The adjacency contains the outbound interface (Gig 1/2) and Layer 2 headers to forward the packet to the next hop/destination.
Note
In the procedure described above, the processing of the packet happens in a single pass without the need for any hardware recirculation. This explains why optimal system performances are achieved in this case.
Figure 57 shows a different scenario where the VPN lookup table is full because more than 512 aggregate labels were allocated on this given PE.
Figure 57 POP Operation with More Than 512 Aggregate Labels
Figure 57 illustrates the egress processing by PFC3/DFC3 when the VPN number is greater than 512 and an aggregate label is being used. The following sequence takes place:
1.
The packet enters the switch with the VPN label.
2.
The packet headers are sent from the line card to the PFC3/DFC3 to perform the forwarding decision. The VPN label (30) does not match an entry in the VPN lookup table, because the table is full and in this example, label 30 is not part of it. This causes the Layer 2 Engine to send the packet to the Layer 3 Engine as an MPLS packet; this is because the MPLS label information is required to perform the routing database lookup at the following step.
3.
The Layer 3 Engine receives the packet and performs the VLAN to VPN mapping that result in VPN 0 being selected. The label (30 in this example) is then looked up in the CEF table and the correct adjacency selected. The adjacency indicates that the MPLS label is to be popped and then the packet re-circulated on internal VLAN 1200.
4.
The packet is sent back to the rewrite engine associated with the particular port and rewritten. The packet then arrives in the Layer 2 Engine the second time and hits a "MISS" in the VPN lookup table (this time because it is an IP packet with no MPLS label information).
5.
The IP packet is passed to the Layer 3 Engine and the VLAN RAM table determines that the packet belongs to VPN 5 (using the internal VLAN 1200 information applied to the packet before recirculation).
6.
The destination address is then looked up in the CEF table against the routes for VPN 5. The appropriate adjacency is then chosen after performing the load balancing hash if multiple parallel paths exist. The adjacency contains the outbound interface (Gig 2/2) and Layer 2 headers to forward the packet to the next hop/destination.
Therefore, for those situations where there are more than 512 VPNs, packet recirculation is required, which means two passes through the PFC, and the entire performance of that particular packet as part of that MPLS VPN drops.
In summary, when performing egress PE functionalities on a Catalyst 6500, optimal performances are achieved only when the number of VRFs defined on the specific PE devices is less than 512; this is not a big issue for campus deployment, where rarely the number of required VPNs is higher that 50. In addition, even when deploying more than 512 VRFs, the performances are reduced only for traffic belonging to the VRFs defined from 513 and above.
LSR Functionality
LSRs receive labeled packets and, depending on their position in the MPLS network, can perform a swap or pop operation. A swap operation is required when the packet comes in with a label and needs to be forwarded to another LSR; in this case, the original label is exchanged with a new label that represents the label this node uses to reach the ultimate destination.
As shown in Figure 58, to perform label swapping, the LSR uses the incoming packet label to execute the lookup into the hardware label database and to determine the new label that should be pushed to the packet before sending it to the neighbor LSR.
Figure 58 LSR Functionality
Note
VPN traffic is characterized by having two MPLS labels added to the packet. However, the label switching is performed by the LSR, always based on the outer label.
The pop operation occurs if this node is performing PHP. If the LSR is adjacent to LER, it is standard behavior to remove the outermost label before forwarding the packet to the LER. This makes the forwarding decision on the LER simpler. For example, in the case of IPv4 unicast, the LER has to perform only an IP forwarding decision instead of a label and IP lookup.
As shown in Figure 59, the information to perform the POP operation is again contained in the hardware label database.
Figure 59 POP Operation
Note
The example in Figure 59 refers to normal MPLS traffic. As discussed in the previous sections, in case of VPN traffic, the packet sent from the penultimate hop device toward the egress LER also contains the VPN label.
Enabling MPLS in the Campus Distribution Block
Virtualizing the Campus Distribution Block described how to virtualize the network devices belonging to each specific campus building block, for both multi-tier and routed access designs. Independently from the specific campus model, the PE functionality is usually performed by the distribution layer devices when deploying MPLS VPN as a path isolation strategy across the campus network. (See Figure 60.)
Figure 60 PE Functionality
Note
The 802.1q trunk shown on the right side of Figure 60 is deployed independently if the access layer switch performs at Layer 2 (multi-tier) or at Layer 3 (routed access). The rest of the discussion below assumes the first scenario.
The Catalyst 6500 platform deployed in the distribution layer needs to have the VRFs defined and the capabilities of communicating on one side with IP switching (toward the access layer devices) and translating that on the other side to MPLS switching (toward the campus core switches). To perform that functionality, the device needs to be able to push VPN labels to the IP packet. This is different from simple VRF-lite support that was for example required when deploying GRE tunnels as path isolation mechanism (see Deploying Path Isolation Using VRF-Lite and GRE).
Figure 61 shows an example of enabling MPLS switching.
Figure 61 Enabling MPLS Switching
The configuration required to enable MPLS switching on the interface facing the campus core is as simple as follows:
interface TenGigabitEthernet1/1description 10GE to core 3ip address 10.122.5.31 255.255.255.254tag-switching ip
Note
It is important to note that the actual configuration (retrieved through the show running-config command) may show the word "tag-switching" in place of "mpls" on 6500 platforms for software releases previous to 12.2(33)SXH. This is just a heritage from the past (tag-switching was the pre-standard label switching mechanism supported on Cisco platforms before MPLS was introduced).
LDP Deployment Considerations
After enabling label switching on all the interfaces facing the core, it is also required to enable LDP. LDP is the IETF prescribed way to discover MPLS neighboring devices and transmit label information between the devices. LDP is largely based upon the pre-standard TDP (Tag Distribution Protocol) that was developed by Cisco for tag switching and was later standardized to become MPLS.
When an interface is enabled for label switching (as shown in the previous section), the LDP process starts and tries to discover other MPLS-enabled neighbors (either PE or P devices) by sending LDP hello packets. When a neighbor has been discovered, an LDP session is established with it by setting up a TCP session on the well-known port 646. As a consequence, IP connectivity is required between neighbors to be able to successfully establish the LDP session. After the LDP session has been established, keepalives messages are exchanged between the neighbor devices (by default every 60 seconds), as highlighted in the following output:
cr20-6500-1#sh mpls ldp parametersProtocol version: 1Downstream label generic region: min label: 16; max label: 524286Session hold time: 180 sec; keep alive interval: 60 secDiscovery hello: holdtime: 15 sec; interval: 5 secDiscovery targeted hello: holdtime: 90 sec; interval: 10 secDownstream on Demand max hop count: 255TDP for targeted sessionsLDP initial/maximum backoff: 15/120 secLDP loop detection: offThere are several best practices recommendations for deploying LDP in a campus environment, and these are discussed in the following bullet points:
•
Configure LDP as label distribution protocol
As previously mentioned, Cisco originally deployed its own label distribution protocol called Tag Distribution Protocol (TDP). As a consequence of this heritage, Catalyst 6500 platforms use TDP by default on all the MPLS-enabled interface, as follows:
cr20-6500-1(config)#mpls label protocol ?ldp Use LDPtdp Use TDP (default)Explicit configuration is then required to change the default behavior and enable the use of LDP:
cr20-6500-1(config)#mpls label protocol ldpcr20-6500-1(config)#do sh mpls ldp parametersProtocol version: 1Downstream label generic region: min label: 16; max label: 524286Session hold time: 180 sec; keep alive interval: 60 secDiscovery hello: holdtime: 15 sec; interval: 5 secDiscovery targeted hello: holdtime: 90 sec; interval: 10 secDownstream on Demand max hop count: 255LDP for targeted sessionsLDP initial/maximum backoff: 15/120 secLDP loop detection: off•
Use loopback interfaces to establish LDP sessions
Each LDP session between MPLS-enabled neighbors is characterized by an LDP identifier that is use similarly to the OSPF or BGP identifiers. By default, the highest IP address of all defined loopback interfaces is used and if there are no loopbacks, the highest IP address of any other interface is adopted as LDP identifier. The recommendation is to define a specific loopback interface to be used for the establishing of the LDP session. The first reason for doing that is the operational control of the LDP identifier; a second important reason is discussed in the next bullet point. The required configuration is shown as follows:
interface Loopback10description LDP identifierip address 192.168.100.19 255.255.255.255end!mpls ldp router-id Loopback10 force
Note
As discussed before, IP connectivity is required between MPLS-enabled neighbors to establish an LDP session. When defining loopback interfaces to be used as LDP identifiers, it is then critical that the loopback is reachable by adjacent devices. This usually implies that the loopback addresses must be advertised by the IGP running in the network and being thus part of the default global routing table. For additional considerations about loopbacks deployment in campus, see Loopback Interfaces Deployment Considerations.
•
Establish targeted sessions between LDP neighbors
LDP plays a critical role when discussing convergence in an MPLS-enabled network. As shown in Figure 62, a link failure event between adjacent MPLS-enabled devices causes the failure of the LDP session between them.
Figure 62 Failure of Regular LDP Session
As highlighted in Figure 62, this means that all the labels that were previously exchanged between the neighbors are now discarded and deleted from the label database. Convergence is usually not an issue for this link failure scenario, because in this case the LDP convergence is almost immediate, and the main factor determining the length of the outage is the time needed by the IGP to converge around the failure.
Different considerations must be made for the reestablishment of the link. Under such a circumstance, the main problem is that IP usually converges much faster than LDP. As a consequence, there may be a temporary incapability to forward label packets until new labels are exchanged and the label database is populated. This does not affect global table traffic (packets can flow also as unlabeled IP data) but it does cause VPN traffic to be dropped (the P device connected to the PE switches traffic) based on the external label. This is usually the IGP label, so if this is missing because of LDP convergence, the switching decision is made based on the actual VPN label, causing the traffic to be dropped or delivered to the wrong destination. A possible workaround for this issue calls for the establishment of targeted sessions between LDP neighbors (see Figure 63.)
Figure 63 Use of LDP Targeted Hellos
As shown in Figure 63, when using targeted hellos between LDP neighbors (for example R1 and R2), the LDP session between these devices is maintained even when the direct link connecting them fails, as long as there is an alternate path for maintaining the TCP session active; in the example, this happens through R3. This means that the MPLS labels that were originally exchanged between the neighbors are kept in the software label database and not discarded; the advantage in doing so is that once the direct link is reestablished, these labels do not need to be learned again, so the IP convergence is the only factor affecting the overall traffic recovery on that link (together with the programming of the hardware label database).
To use this capability, Cisco recommends following three main design recommendations:
–
Build a high degree of redundancy when deploying the campus network, so that there is always at least a redundant path connecting each pair of network devices.
–
Configure loopback interfaces as LDP identifiers, as previously discussed. In fact, if the LDP session is established by using the IP address of the physical interfaces connecting the neighbor devices, the targeted hellos feature cannot provide any benefit (the TCP session is broken as soon as the physical link fails). Note that it is also required to inject the loopback interface IP addresses into the IGP in use to successfully establish the TCP sessions between neighbors.
–
Specify that the LDP session established with the neighbor devices must be a targeted session, as shown in the following configuration sample:
cr20-6500-1(config)#mpls ldp neighbor 192.168.100.19 targeted ?ldp Use LDPtdp Use TDP<cr>
Note
It is also optional to specify if LDP or TDP should be used between the LDP neighbors. Cisco recommends to configure the specific label protocol to be used globally, as previously discussed.
From a verification standpoint, as shown in the following example, the LDP session with the neighbor of this example (192.168.100.19) is maintained via the directly connected link (interface Ten1/1):
cr20-6500-1#sh ip route 192.168.100.19Routing entry for 192.168.100.19/32Known via "ospf 100", distance 110, metric 2, type intra areaLast update from 10.122.5.30 on TenGigabitEthernet1/1, 00:00:04 agoRouting Descriptor Blocks:* 10.122.5.30, from 10.122.5.103, 00:00:04 ago, via TenGigabitEthernet1/1Route metric is 2, traffic share count is 1If the physical link fails, the LDP session is maintained in active state via the alternate path (via Ten1/3 and the distribution layer peer):
cr20-6500-1(config)#int t1/1cr20-6500-1(config-if)#shutcr20-6500-1(config-if)#endcr20-6500-1#sh ip route 192.168.100.19Routing entry for 192.168.100.19/32Known via "ospf 100", distance 110, metric 4, type inter areaLast update from 10.137.0.3 on TenGigabitEthernet1/3, 00:00:07 agoRouting Descriptor Blocks:* 10.137.0.3, from 10.122.5.114, 00:00:07 ago, via TenGigabitEthernet1/3Route metric is 4, traffic share count is 1cr20-6500-1#sh mpls ldp neighbor 192.168.100.19Peer LDP Ident: 192.168.100.19:0; Local LDP Ident 192.168.100.5:0TCP connection: 192.168.100.19.11094 - 192.168.100.5.646State: Oper; Msgs sent/rcvd: 106/85; DownstreamUp time: 00:15:24LDP discovery sources:Targeted Hello 192.168.100.5 -> 192.168.100.19, active, passiveAddresses bound to peer LDP Ident:192.168.100.19 172.26.159.146 10.122.5.11 10.122.5.1210.122.5.34Duplicate Addresses advertised by peer:2.2.2.2In addition, note that all the labels learned from the LDP neighbor are still kept in the software label database. Regarding the prefix 10.122.5.2/31 in the example above, it is possible to notice in the following example how tag 23 is still associated to it in the label database. This tag was originally learned from the interface Ten1/1 before of its failure.
cr20-6500-1#sh mpls ldp bind neighbor 192.168.100.19tib entry: 0.0.0.0/0, rev 154remote binding: tsr: 192.168.100.19:0, tag: imp-nulltib entry: 2.2.2.2/32, rev 2remote binding: tsr: 192.168.100.19:0, tag: imp-nulltib entry: 10.122.5.2/31, rev 72remote binding: tsr: 192.168.100.19:0, tag: 23tib entry: 10.122.5.6/31, rev 66remote binding: tsr: 192.168.100.19:0, tag: 20(output suppressed)The hardware label database is instead programmed to use a different label (the LSP is built via the alternate path now that Ten1/1 has failed); this can be verified by looking at the specific label that is used to reach one of the prefixes shown above (10.122.5.2 in this example):
cr20-6500-1#sh mpls forwarding-table 10.122.5.2Local Outgoing Prefix Bytes tag Outgoing Next Hoptag tag or VC or Tunnel Id switched interface52 21 10.122.5.2/31 0 Te1/3 10.137.0.3As expected, the outgoing label currently in use is 21 out of interface Ten1/3 (and not the tag 23 that was originally learned via Ten1/1). However, as soon as the link is recovered, the hardware is reprogrammed with the updated information without requiring a new learning of that label:
cr20-6500-1(config)#int t1/1cr20-6500-1(config-if)#no shutcr20-6500-1#sh mpls forwarding-table 10.122.5.2Local Outgoing Prefix Bytes tag Outgoing Next Hoptag tag or VC or Tunnel Id switched interface52 23 10.122.5.2/31 0 Te1/1 10.122.5.30
Note
Starting from software release 12.2(33)SXH, Catalyst 6500 platforms support another feature that can achieve the same results discussed above via LDP targeted hellos. This is the LDP session protection functionality; more information can be found at: http://www.cisco.com/en/US/docs/ios/12_0s/feature/guide/fssespro.html.
In summary, the use of loopback interfaces for establishing LDP-targeted sessions between neighbor network devices provides for fast hardware recovery for failed links and thus represents the recommended best practice. In addition, the use of loopback interfaces addressed from a specific and well identifiable IP pool provides a further advantage that is discussed in Tagging or Non-Tagging Global Table Traffic.
MP-iBGP Deployment Considerations
In an MPLS VPN design, the exchange of VPN routes is achieved by using an additional control plane element called Multi-Protocol BGP (MP-BGP), which is an extension of the existing BGP-4 protocol. In the context of this guide, MP-BGP is introduced only as an overlay protocol to provide the capabilities for exchanging VPN routes. Very large networks can be deployed as separate autonomous systems (AS), and in such scenarios, the use of BGP may be required also to connect these separate AS and exchange global table routes. The recommended design discussed here is instead constituted by a single AS and an IGP deployed end-to-end, so that there is no requirement for BGP in global table.
As a consequence, MP-BGP needs to be configured only between the PE devices, because they are the only ones containing VPN routes in the various VRF routing tables. A direct consequence of the fact that the main MPLS VPN strength is to provide any-to-any connectivity inside each defined VPN is the requirement for the PE devices to establish MP-iBGP connections between them in a fully-meshed fashion. By deploying route reflectors, it is possible to relax this requirement, thus improving the scalability of the overall solution.
MP-iBGP is required within the MPLS VPN architecture because the BGP updates exchanged between PE devices need to carry more information than just an IPv4 address. At a high level, the following three pieces of information are critical to the MPLS VPN functionality and that are exchanged through MP-iBGP:
•
VPNv4 addresses—Address prefixes defined in the context of each VPN that need to be communicated between the various PE devices to provide connectivity inside each VPN. A VPNv4 address is achieved by concatenating together the IPv4 prefix and a 64-bit entity called a route distinguisher (RD). A unique RD needs to be used for each VRF defined on the PE device. The RD uniqueness contributes to the uniqueness of each VPNv4 prefix, allowing the support of overlapping IPv4 prefixes between separate VPNs.
•
MPLS VPN label information—Each PE allocates a specific MPLS label for each defined VPN prefix. This is the more internal label that is pushed in each MPLS packet before sending it to the MPLS core, and is used by the receiving PE to determine in which VPN to route the packet.
•
Extended BGP communities—The most important of these extended communities is called the route target and represents a 64-bit value that is attached to each BGP route. The value of the route target determines how the VPN routes are exported and imported into each VPN. Basically, every VPNv4 routes received by a PE may have one of more route target associated to it; depending on the route targets configured locally on the receiving PE for each VRF, the route is either imported or ignored for that specific VRF. Using route targets provides great flexibility to provision many different VPN topologies. In the context of this guide, how to provide any-to-any connectivity inside each VPN is discussed. For an explanation of how to deploy a hub-and-spoke topology as opposed to an any-to-any topology, see the Network Virtualization—Services Edge Design Guide at the following URL: http://www.cisco.com/en/US/docs/solutions/Enterprise/Network_Virtualization/ServEdge.html.
When discussing the deployment of MPLS VPN in a campus environment, the following specific recommendations should be followed:
•
Differing from a traditional service provider environment, the first thing to consider when deploying MPLS VPN in a campus distribution block is the absence of a traditional CE device, because all the VPN subnets are directly connected to the PE devices deployed at the distribution layer (given that the access layer switches are functioning as Layer 2 devices). This means that there is no need for a CE-PE control protocol. All the VPN subnets are showing into each defined VRF as directly connected, which essentially allows injecting them into MP-BGP by simply configuring the redistribute connected option, as follows:
router bgp 64000no bgp default ipv4-unicastbgp log-neighbor-changes!address-family ipv4 vrf v1redistribute connectedno auto-summaryno synchronizationexit-address-family•
MP-iBGP sessions should be established by using loopback interfaces. This brings the obvious advantage of allowing the iBGP session to remain active as long as there is an available path connecting to the loopback IP address. In addition, there are also some operational advantages in assigning an IP address to the loopback interfaces taken from a unique and easy identifiable subnet. This point is discussed in LDP Deployment Considerations; it is in fact recommended to use the same loopback interface as the LDP identifier and for establishing MP-iBGP sessions. Another reason for doing this is discussed in Tagging or Non-Tagging Global Table Traffic. When using loopback interfaces, the configuration look like the following sample:
interface Loopback10description mBGP anchor pointip address 192.168.100.5 255.255.255.255!router bgp 64000no bgp default ipv4-unicastbgp log-neighbor-changesneighbor 192.168.100.1 remote-as 64000neighbor 192.168.100.1 update-source Loopback10Special considerations need to be made about the loopback interfaces when using OSPF as the IGP in the global table (see Loopback Interfaces Deployment Considerations).
•
Given the fact that MP-iBGP sessions need to be established between all the PE devices defined in the network, Cisco recommends using route reflectors for a better scalability and manageability of the solution. The route reflector should be deployed on standalone devices connected, for example, to the P core devices, as shown in Figure 64.
Figure 64 Positioning of Route Reflectors
One of the main advantages in using standalone devices is stability. Upgrade of code to P or PE devices can be performed without touching the RR that can continue performing its function. Also, the MP-BGP configuration required on each PE devices becomes identical; all the PEs have to peer with the two route reflectors, as shown in the following example. This design recommendation considerably reduces maintenance time and improves operational ease of troubleshooting.
router bgp 64000no bgp default ipv4-unicastbgp log-neighbor-changesneighbor 192.168.100.1 remote-as 64000neighbor 192.168.100.1 update-source Loopback10neighbor 192.168.100.2 remote-as 64000neighbor 192.168.100.2 update-source Loopback10!address-family vpnv4neighbor 192.168.100.1 activateneighbor 192.168.100.1 send-community extendedneighbor 192.168.100.2 activateneighbor 192.168.100.2 send-community extendedexit-address-familyOn the RR side, the configuration is straightforward:
router bgp 64000no bgp default ipv4-unicastneighbor RR-clients peer-groupneighbor RR-clients remote-as 64000neighbor RR-clients update-source Loopback10neighbor 192.168.100.3 peer-group RR-clientsneighbor 192.168.100.4 peer-group RR-clientsneighbor 192.168.100.5 peer-group RR-clientsneighbor 192.168.100.6 peer-group RR-clients!address-family vpnv4neighbor RR-clients activateneighbor RR-clients send-community extendedneighbor RR-clients route-reflector-clientneighbor 192.168.100.3 peer-group RR-clientsneighbor 192.168.100.4 peer-group RR-clientsneighbor 192.168.100.5 peer-group RR-clientsneighbor 192.168.100.6 peer-group RR-clientsexit-address-family
Note
When positioning the RRs as separate network devices (as in the recommended model displayed in figure above), no MPLS or VRF definitions are required on these devices.
•
Aggregation of VPN subnets—Summarization of VPN routes from each campus distribution block toward the core is not recommended best practice because it may lead to a black hole situation under a specific failure scenario. As shown in Figure 65, assume that both PEs belonging to the distribution block are aggregating VPN routes toward the core; for example, advertising a /16 super-net.
Figure 65 Summarizing VPN Routes
A look in the VRF routing table of each PE shows the VPN subnet directly connected and the summary pointing to Null0:
cr20-6500-1#show ip route vrf v1Routing Table: v1Codes: C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter areaN1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGPi - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static routeGateway of last resort is not set10.0.0.0/8 is variably subnetted, 15 subnets, 3 masksB 10.137.0.0/16 [200/0] via 0.0.0.0, 00:43:59, Null0C 10.137.13.0/24 is directly connected, Vlan13C 10.137.12.0/24 is directly connected, Vlan12Now assume one of the uplink from the access layer to the distribution switch fails, as shown in Figure 66.
Figure 66 Link Failure when Summarizing VPN Routes
Without VPN route aggregation, the PE on the left directly connected to the failed link learns (via BGP) the path toward the subnet 10.137.12.0 via the peer PE device. When summarizing instead, the PE ignores the summary learned from the peer because it already has a summary route pointing to Null0, as shown in the following example:
cr20-6500-1#show ip route vrf v1Routing Table: v1Codes: C - connected, S - static, R - RIP, M - mobile, B - BGPD - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter areaN1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGPi - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2ia - IS-IS inter area, * - candidate default, U - per-user static routeo - ODR, P - periodic downloaded static routeGateway of last resort is not set10.0.0.0/8 is variably subnetted, 15 subnets, 3 masksB 10.137.0.0/16 [200/0] via 0.0.0.0, 00:43:59, Null0C 10.137.13.0/24 is directly connected, Vlan13As a consequence, the PE starts dropping all the traffic delivered to it from the core of the network and destined to the specific 10.137.12.0 subnet. This is the reason why summarization of VPN routes from each distribution block is not a recommended best practice.
Configuring the Core Devices (P Routers)
The configuration of the devices building the core of the MPLS network (P devices) is much simpler than the one discussed in the previous section for PE switches because of the following two main reasons:
•
P devices do not generally require any VRF configuration or network services virtualization. These functionalities are deployed only on the PE switches sitting at the edge of the MPLS network. The main task of the P switches consists in label switching the received packets, allowing for the establishment of LSPs across the network infrastructure (it has already been discussed how these LSPs can be used to switch both global table and VPN traffic).
•
As a direct consequence of the previous point, there is no the requirement for the additional control plane protocol MP-BGP to be deployed on P devices. The only routing protocol in use is the IGP traditionally deployed to establish global table connectivity.
Note
The requirements for deploying P devices in the core of the network are the same as PE switches. Therefore, only Catalyst 6500 switches with Supervisors equipped with PFC3B or higher are currently available for this role.
Given the considerations above, the following are the basic configuration steps required for P (core) switches deployment. As previously mentioned, the assumption is that global table configuration (routing, IP addressing, and so on), is already in place before starting the virtualization of the network infrastructure.
Step 1
Enable MPLS switching on all the physical interfaces connecting the P devices to other P or PE switches, as shown in Figure 67.
Figure 67 Enabling MPLS on P Devices
interface TenGigabitEthernet1/1description 10GE to PE3ip address 10.122.5.37 255.255.255.254tag-switching ipStep 2
Configure LDP parameters similarly to PE:
•
Explicitly enable standard LDP
•
Define a loopback interface to be used as the LDP identifier
•
Inject the loopback /32 address into IGP
•
Establish targeted sessions between LDP neighbors
Redundancy and Traffic Load Balancing
Because of the business-critical functions usually supported by campus networks, the design has evolved to one supporting a high degree of redundancy to achieve the required high availability. This leads to the deployment of redundant devices in the core and distribution layers, redundant supervisors in the access layer, and redundant links connecting the various layers of the hierarchical network. The application traffic in the VPNs is also considered mission-critical and needs to be protected in a similar fashion as the global table traffic. Therefore, it is important to understand how to use the infrastructure redundancy also for that purpose.
To achieve this, several configuration steps need to be implemented. To understand this point better, see the network diagram in Figure 68.
Figure 68 Achieving Redundancy and Traffic Load Balancing
In Figure 68, PE1 and PE2 are connected to a subnet (10.137.12.0/24) mapped to VRF v1 (thus part of a specific VPN). Because all the devices in the example are connected in a fully meshed fashion, it is desirable for the VPN traffic flowing between the two distribution blocks to also benefit from this link redundancy.
For this to happen, the first design recommendation is to configure a different RD value for the two PE devices belonging to the same distribution block. To understand the reasons for this choice, a brief review of how the PE devices on the bottom receive the VPN routes from the upper PEs is useful.
As shown in Figure 69, when deploying RRs, all the PE devices must establish an MP-iBGP session with the RRs (for simplicity sake, only one RR is discussed in this example).
Figure 69 Establishing MP-iBGP Sessions with Route Reflector
PE1 and PE2 must advertise the same IPv4 subnet (10.137.12.0/24) to the RR via MP-IBGP. By default, the RR chooses one of the two VPNv4 updates received and "reflects" the best one to the other RR clients; in this example, the bottom PE3 and PE4. As a consequence, if the RD value configured on PE1 and PE2 is the same, they both advertise the same VPNv4 route to the RR, and the RR reflects only the better one to the bottom PEs. Configuring a distinct RD value instead has the consequence of making the VPNv4 update unique sent by PE1 and PE2 for the same IPv4 prefix 10.137.12.0. The RR then "reflects" both VPNv4 prefixes to the bottom PEs.
The configuration required for achieving load balancing and redundancy is therefore the following:
•
PE1
ip vrf v1rd 64001:1route-target export 64000:1route-target import 64000:1•
PE2
ip vrf v1rd 64002:1route-target export 64000:1route-target import 64000:1Notice how the route-target values need to remain the same on both PEs because they both need to import into the specific VPN routing table the same updates received by remote PEs. This is required on all the PEs when the goal is to achieve any-to-any connectivity inside each VPN.
At this point, the bottoms PEs receive two separate VPNv4 updates for the same IPv4 prefix 10.137.12.0/24. However, an additional configuration step is still required for them to import both the routes in the VPN routing table. This is because by default, the BGP process on the receiving PE devices installs only the best route in the routing table. To change this behavior, the following additional configuration step is required:
•
PE3/PE4
router bgp 64000!address-family ipv4 vrf v1maximum-paths ibgp 2 import 2After configuring the above command, the BGP process on the bottom PEs installs both routes received from the upper PEs in routing table, and these routes are consequently imported into the control plane relative to VRF v1, as follows:
PE3#sh ip route vrf v1 10.137.12.0Routing entry for 10.137.12.0/24Known via "bgp 64000", distance 200, metric 0, type internalLast update from 192.168.100.6 2w3d agoRouting Descriptor Blocks:* 192.168.100.6 (Default-IP-Routing-Table), from 192.168.100.2, 2w3d agoRoute metric is 0, traffic share count is 1AS Hops 0192.168.100.5 (Default-IP-Routing-Table), from 192.168.100.1, 2w3d agoRoute metric is 0, traffic share count is 1AS Hops 0
Note
This happens only with equal cost routes. It is possible also to import unequal cost routes with the command maximum-paths ibgp unequal-cost.
Now that load balancing is achieved from the point of view of the control plane, the discussion needs to focus on how the traffic is actually sent over the physical link; that is, how load balancing is obtained from a data plane point of view on Catalyst 6500 platforms.
As shown in the network diagram above, in a fully meshed design each PE has a redundant equal cost path that can be used to reach the loopback interfaces of the remote PEs. Because each VPN route is then learned from both PEs, the consequence is that each PE is able to send VPN traffic over four distinct Label Switched Paths (LSPs), two on each physical link connecting the PE device to the core. This can be verified as follows:
PE3#sh mls cef vrf v1 10.137.12.0Codes: decap - Decapsulation, + - Push LabelIndex Prefix Adjacency3219 10.137.12.0/24 Gi1/3 16(+),56(+) (Hash: 0001)Gi1/2 16(+),39(+) (Hash: 0002)Gi1/3 16(+),55(+) (Hash: 0004)Gi1/2 16(+),37(+) (Hash: 0008)As shown in Figure 70, the same inner MPLS VPN label 16 is used to send traffic toward the destination subnet, whereas a different outer label is inserted to label switch traffic to the remote PEs.
Figure 70 Establishment of Redundant LSPs
Imposing these labels allows each PE to build the four distinct LSPs to reach the remote PE loopback interfaces (192.168.100.5 and 192.168.100.6). This can be verified as follows:
Bottom_PE_Left#sh mls cef 192.168.100.5Codes: decap - Decapsulation, + - Push LabelIndex Prefix Adjacency82 192.168.100.5/32 Gi1/3 55(+) (Hash: 0001)Gi1/2 37(+) (Hash: 0002)Bottom_PE_Left#sh mls cef 192.168.100.6Codes: decap - Decapsulation, + - Push LabelIndex Prefix Adjacency84 192.168.100.6/32 Gi1/3 56(+) (Hash: 0001)Gi1/2 39(+) (Hash: 0002)Note that two LSPs are formed to reach the loopback interfaces of each remote PE. These LSPs are built out of the two physical interfaces connecting the PE devices to the core.
Now the question is how the PE decides which LSP to use for each specific packet. To answer this, keep in mind how the Catalyst 6500 platforms behave for MPLS traffic in the presence of redundant equal cost paths. Figure 71 describes the various possible scenarios.
Figure 71 MPLS Load Balancing on Catalyst 6500
Because only two labels are imposed on each packet when switching MPLS VPN traffic, the consequence is that the first option is valid in that case. This means that packets are assigned to each LSP based on the source and destination IP addresses pair; therefore, per-flow LSP assignment is performed. This can be easily verified with the following commands:
cr23-6500-1#sh mls cef exact-route vrf v1 10.138.12.11 10.137.12.11Interface: Gi1/3, Next Hop: 224.0.6.84, Vlan: 1020, Destination Mac: 0009.448f.8200cr23-6500-1#sh mls cef exact-route vrf v1 10.138.12.11 10.137.12.12Interface: Gi1/3, Next Hop: 224.0.6.86, Vlan: 1020, Destination Mac: 0009.448f.8200cr23-6500-1#sh mls cef exact-route vrf v1 10.138.12.11 10.137.12.13Interface: Gi1/2, Next Hop: 224.0.6.87, Vlan: 1019, Destination Mac: 0005.3142.c400cr23-6500-1#sh mls cef exact-route vrf v1 10.138.12.11 10.137.12.15Interface: Gi1/2, Next Hop: 224.0.6.85, Vlan: 1019, Destination Mac: 0005.3142.c400Changing the destination IP address (and thus the flow), a different physical interface and corresponding next-hop value is used. The combination physical interface/next-hop MAC address identifies a different LSP in each case.
Note
Based on what shown in Figure 71, global table traffic (using a single MPLS label) will be load-balanced based on the source and destination IP information.
It is important to note that using distinct RDs on the two PE devices belonging to the same distribution block causes a larger utilization of memory resources on the PE itself. To understand the reason, it is required to analyze the logic behind the use of RDs on the PE devices. Every time a PE receives a new VPNv4 route (from the route reflector in this specific design), it does the following:
•
If the RD of the received route is equal to the RD locally defined on the PE for that specific VRF, the route is imported in the BGP table (assuming also that the route target is configured to allow this).
•
If the RD of the received route is different from the local RD, the PE imports the route in the BGP table (under the "section" corresponding to the locally defined RD), and it also keeps a copy in a different section of the BGP table corresponding to the received RD value.
Note
This logic was deployed essentially to allow to keep track of which PE devices sent each route, under the assumption that each PE defines a unique RD for the same VRF (this is typical for example in a service provider environment).
Still referring to the example discussed above, because the values used for the pair of PEs are common between all the various distribution blocks (but unique between the PEs deployed in the same block), the VPNv4 routes received, for example, by the PEs in the upper distribution block from the PEs in the lower distribution block would be characterized by two distinct RD values. By looking at the BGP table on each of these PE, the increase of memory required to store this information is evident, as follows:cr20-6500-1#sh ip bgp vpnv4 allBGP table version is 11740, local router ID is 192.168.100.5Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,S StaleOrigin codes: i - IGP, e - EGP, ? - incompleteNetwork Next Hop Metric LocPrf Weight PathRoute Distinguisher: 64001:1 (default for vrf v1)* i10.138.12.0/24 192.168.100.10 0 100 0 ?* i 192.168.100.10 0 100 0 ?* i 192.168.100.9 0 100 0 ?*>i 192.168.100.9 0 100 0 ?<SNIP>Route Distinguisher: 64002:1* i10.138.12.0/24 192.168.100.10 0 100 0 ?*>i 192.168.100.10 0 100 0 ?As shown above, the route 10.138.12.0 in the section for default RD for the VRF v1 is imported as learned by both remote PEs (192.168.100.9 and 192.168.100.10). However, in the section for RD 64002:1, it is imported as learned only by one of the two remote PEs (the one configured with that specific RD in VRF v1). If all the PEs used the same RD, the second part of the information would not be present, thus saving memory. At the same time, all the characteristics of load balancing and redundancy discussed in this section would not be achieved. It is also worth considering that using separate RD values on each PE defined in the campus network still allows load balancing, but causes excessive memory use to store all the routes received from the other PEs with unique RDs on each PE. As a consequence, the recommended best practice is to have unique RDs between the two PEs belonging to the same campus distribution block, but reusing these values for all the pairs of PE deployed in the other distribution blocks.Dealing with MTU Size Issues
Every time a tunneling technology is deployed, concerns about MTU size usually arise. Configuring MPLS VPN causes two additional tags to be imposed on each IP packet. This causes an increase of up to 8 bytes to the overall IP size of the packet. Assuming that the endpoints are generating IP packets with full 1500-byte sizes, it is logical to expect some problems to arise. The issues generally arise when the 1500-byte packets reach the PE devices that are responsible for MPLS label imposition.
If the DF bit in the IP packet is set to 1 (this is generally the case because the endpoint sets the bit to perform path MTU discovery), the PE is not able to add the 8 bytes and then send a packet out of the interfaces connected to the core, assuming they are configured with the default 1500-byte MTU size. At the same time, the PE is not able to fragment the packet because of the DF bit setting, so it drops the packet and returns an Destination Unreachable ICMP message to the source of this IP datagram, with the code indicating fragmentation needed and DF set (type 3, code 4). When the source station receives the ICMP message, it lowers the send message segment size (MSS), and when TCP retransmits the segment, it uses the smaller segment size.
This process works assuming that the end station is actually receiving the ICMP message and it is able to properly lower the MTU size of the generated IP packets. If either of these conditions are not met (for example, because the endpoint is not able to properly process the ICMP message and consequently lower the MTU of the packets it generates), the end stations continue to send full-size IP packets and the PE device keeps dropping them, effectively blackholing all the VPN traffic. For this reason, a different mechanism should be deployed to ensure that the VPN traffic is never dropped because of MTU-related issues. This also helps with UDP traffic characterized by large frame sizes (as for example, the one generated by video applications) for which the Path MTU discovery mechanism cannot be applied anyway.
The following two solutions to this problem can be deployed:
•
Configure jumbo-frame support on all the MPLS enabled interfaces
The first method consists in increasing the MTU of the physical interfaces enabled for label switching. Because the default MTU size supported on Ethernet interfaces is 1500 bytes, increasing that value to at least 1508 allows the successful transmission of the MPLS labeled packets, both for the global table and VPN traffic. The required configuration is as follows:
interface TenGigabitEthernet1/1mtu 1508tag-switching ip
Note
Jumbo frame support must be configured on all the MPLS-enabled interfaces of P and PE devices.
The Catalyst 6500 platform can support jumbo frame sizes as of release 12.1(1)E for Native IOS. However, this support is dependent on the type of line cards that you use. There are generally no restrictions to enable the jumbo frame size feature. You can use this feature with trunking/non-trunking and channeling/non-channeling. As shown in the configuration sample above, a value of 1508 is enough to account for the two MPLS labels added for VPN traffic. However, the maximum jumbo frame size supported on the individual port is 9216; an application specific integrated circuit (ASIC) limitation limits the MTU size to 8092 bytes on the following 10/100-based line cards:
–
WS-X6248-RJ-45
–
WS-X6248A-RJ-45
–
WS-X6248-TEL
–
WS-X6248A-TEL
–
WS-X6348-RJ-45
–
WS-X6348-RJ-45V
–
WS-X6348-RJ-21
Note
The WS-X6516-GE-TX is also affected at 100 Mbps; whereas at 10/1000 Mbps, up to 9216 bytes can be supported.
One specific issue may arise when modifying the MTU size of the physical interface, which is related to the fact that OSPF does not allow the establishment of adjacencies between devices that have configured a different MTU size on their connecting interfaces. For example, this can be the case when connecting the WAN edge devices to the campus core, as shown in Figure 72.
Figure 72 MTU Mismatch
It may well happen that the network device deployed in the WAN edge (for example, often a Cisco 7200 Series router) is connected to the core via interfaces that do not support the setting for jumbo frames. It may also usually be a valid assumption that frames received from the remote locations across the WAN are not full 1500-byte sizes. For example, typical deployments use IPsec + GRE over the WAN, so the frames are usually already reduced of size to be carried over the tunnels. Thus, the fact that the MTU size cannot be increased on the WAN edge devices for interfaces connecting to the core may not be a problem. However, this is not the case for traffic coming from the core of the campus and directed toward the WAN edge, so configuring jumbo frame supports on these interfaces may still be required (as shown in Figure 72). The different MTU size setting on the two side of the link prevents the creation of the OSPF adjacency. To work around this issue, the following specific command needs to be issued on the interfaces of the WAN devices:
interface FastEthernet1/0description Link to campus coreip address 10.122.5.101 255.255.255.254ip ospf dead-interval minimal hello-multiplier 4ip ospf mtu-ignoreDoing so instructs the OSPF process running on the WAN edge device to not consider the MTU value as a criterion for the establishment of OSPF adjacency with the core routers.
•
Use the mpls mtu interface command
Configuring a value of 1508 on all the MPLS-enabled interfaces allows for transmission of full 1500-byte sized IP packets, because the two additional labels are not considered when comparing the size of the frame to the MTU of the physical interface. The following configuration is enabled on all the MPLS-enabled interfaces of the network (both on the PE and P devices):
interface TenGigabitEthernet1/1mpls mtu 1508mpls ip
Note
Note that the "mpls" part of the command is automatically changed to "tag-switching" on Catalyst 6500 platforms in software releases pre-12.2(33)SXH.
The main advantage of this approach as compared to the one discussed in the previous bullet is that the MPLS MTU setting does not affect the establishment of routing adjacencies when deploying OSPF. Therefore, this is the recommended approach.
Tagging or Non-Tagging Global Table Traffic
The use of network virtualization in the context of this guide is positioned as an evolutionary overlay design that results in much of the traffic remaining in the global table; users or devices are selectively removed from the global routing table to be part of the defined VPNs to solve specific problems (guest/partner access, NAC remediation, and so on).
When MPLS is enabled on the physical link connecting each PE device (in the distribution layer of each campus distribution block) to the high speed core, all the traffic flowing in the network starts to be tagged. The global table traffic uses a single MPLS tag, whereas all the packets related to VPN traffic are characterized by an internal VPN tag and an outer IGP label.
One possible option is then to modify this default behavior and to start tagging only the VPN traffic, leaving all the communications in global table untagged. There are several advantages in doing this:
•
MTU—Traffic in global table does not have any of the MTU issues previously discussed because no tags are added to the original packet.
•
Troubleshooting—Because global traffic is IP switched and not label switched, this means that all the typical troubleshooting tools can be used to verify the functionalities of global table traffic. There is no requirement to understand the MPLS-specific tools that are discussed in MPLS-Specific Troubleshooting Tools.
•
QoS—As previously discussed, after traffic is tagged with an MPLS label, there are three bits in the MPLS header (the EXP bits) that can be used for carrying QoS information. This allows supporting up to eight classes of traffic, so in the specific situations where the enterprise has already implemented a QoS strategy based on the use of more than eight classes, not tagging the global traffic helps in not disrupting such strategy. Traffic in global table continues to be classified and marked in the same way it was before MPLS VPN was turned on and no changes in the queuing strategy need to be put in place in the overall network.
•
Traffic load-balancing—Global table packets containing a single MPLS label are load-balanced across the existing equal cost paths based on the source and destination IP addresses values. If global table traffic is sent untagged, the Layer 4 port information can also be included to calculate the hash value for load-balancing, allowing for a better statistical distribution of flows across the equal cost links.
From a convergence perspective, there is actually not much difference between the scenarios where global traffic is tagged or not. The main factor contributing to the convergence time (in a box/link failure scenario) is IGP convergence; the LDP component is negligible.
In summary, the main advantage of not tagging global table traffic is that the creation of the virtual network becomes a process that is not disruptive to the functionalities already in place in the enterprise network. This functions well with the initial design principle that virtualization should be used to address specific problems and should not affect the majority of the "normal" enterprise communications.
The question now becomes determining the best solution to implement untagged global table traffic. The recommended option is to influence the way LDP exchange tags between the various network devices (P and PE) in the MPLS network. Each device is locally assigning a label to each prefix contained in the global routing table. This functionality is triggered as soon as an interface is configured for label switching and cannot be stopped. What is possible to do instead is to control which lab









































































