Today, Netgate® announces the availability of TNSR® software release 20.08. Our last release was 20.02 in February. While we try to push a new release every four months, this one took a bit longer. Beyond the many feature adds and user experience improvements, a significant part of this release is “under the hood”, not directly visible to users per se. Sometimes the speed of addressing customer-requested features needs to take a back seat to underlying architectural improvements, even if not readily apparent to users, and TNSR 20.08 is one of these releases.
Routers can be thought of as a data plane and a control plane. The data plane is the set of functions and processes responsible for transforming and forwarding traffic, while the control plane consists of the functions and processes that determine how packets should be forwarded. Routing protocols e.g., BGP, OSPF, LDP, and spanning tree are all examples of control plane functions. While TNSR uses VPP for its data plane, it uses FRR routing protocol functionality.
This requires that FRR be able to communicate with VPP. For the past four years, the short-path to this problem has been to implement a router “plugin” based on the example code supplied with VPP. The problem with this code is that it is architecturally flawed, and the more you try to fix it, the more difficult the problems become. As an example, the router plugin uses a set of ‘tap’ interfaces, but it didn’t mirror the VPP interface state to the associated tap interface. Traditional linux routing implementations such as Bird or FRR depend on this interface state to function. Since the router plugin view was VPP-centric, the part of the solution was to copy the VPP interface state to the associated tap interface. This in turn causes a cascade of other issues, as the linux side can end up in a different state than VPP. Another issue is that the router plugin and associated tap interfaces is low-performance. On low-end hardware (an 8 core Intel C2000) we measured throughput of around 300Mbps inbound and 100Mbps outbound. While this only affects network flows that terminate on the router (such as a BGP UPDATE message), it was still far too slow in our view. The rabbit hole of these types of issues with the existing router plugin code goes very deep, and required us to also maintain a custom variant of FRR to correctly function with VPP.
All of this has been addressed now that the original TNSR router plugin has been replaced with the linux-cp plugin. We started with code written by Neale Raans at Cisco - and then spent nearly six months advancing that code and fitting it into TNSR. The linux-cp plugin has a linux-centric view in that the linux interfaces are configured, and this configuration is copied to the associated VPP interfaces. Netlink messages which signal that a route has been added, changed or removed by any process, be it FRR or a simple “ip route” command, are correctly tracked and reflected to VPP.
This adds significant product stability and eases feature insertion, testing and integration with other system level functions, as well as enabling control / data plane separation via network namespaces and Virtual Routing and Forwarding. Performance has also increased. On that same C2000-based system, flows through linux-cp to and from the host stack now run 30X-40X faster than with the router plugin. This performance increase can directly lead to associated effects including shorter convergence times for protocols such as BGP. All of this work also required a significant update to our test harness and associated test environments.
Finally, all of the linux-cp changes have been upstreamed to the FD.io VPP project, a testament to our ongoing commitment to open source software contribution.
In TNSR 20.08 host stack services have been moved to a non-default network namespace. This provides isolation between the environments for the host OS and VPP data plane. The separation prevents the host OS from intentionally or unintentionally using TNSR network connectivity and vice versa, by ensuring services are only accessible in conjunction with relevant interfaces. For example: The RESTCONF API or ssh service may be intentionally configured to only be accessible from host interfaces, such as a management network, and not from TNSR interfaces with untrusted traffic.
Virtual Routing and Forwarding
TNSR 20.08 also adds Virtual Routing and Forwarding (VRF) support. Previously, TNSR supported multiple routing tables used to direct traffic on various interfaces. That functionality is now replaced with VRF - providing more capabilities including:
- multiple instances of routing tables and forwarding tables with overlapping address spaces
- destination-based policy routing on a per-interface basis
- dynamic routing daemons to manage routes on a per interface(s) basis rather than globally
- multiple instances of dynamic routing daemons (BGP, OSPF, etc.) with different options per VRF
- VRF isolation where unless directed to cross into another VRF via specific route destinations, each VRF is isolated from other VRFs - allowing for sets of multiple interfaces to be treated as fully separate routers
For existing TNSR installations, on upgrade to TNSR 20.08, existing non-default routing tables are automatically converted to VRF entries.
TNSR 20.08 adds NAT Traversal (NAT-T), a standards-based (RFC 3715, RFC 3947) approach for IPsec encapsulation in User Datagram Protocol (UDP). Now data protected by IPsec can pass through NAT – enabling IPsec VPN connections that traverse connections where NAT is present, especially for service providers. Without NAT-T, VPN connections using IPsec can experience failed connections and/or dropped packets.
TNSR 20.08 also updates our use of the following underlying open source projects:
- CentOS updated to 8.2
- VPP updated to 20.01
- FRR updated to 7.3.1
- strongSwan updated to 5.8.4
- Clixon updated to 4.5.0
- Kea updated to 1.7.7
TNSR now automatically manages service stops and starts - at boot time, or after changes - directly from the shell. Users are no longer required to manually recall or determine which services to stop and restart.
A number of updates to the Command Line Interface (CLI) improve overall user experience, including:
- resolution of issues causing excessive delays when displaying large route tables
- resolution of issues causing excessive memory consumption when displaying large route tables
- improved handling of configuration changes so they are only applied when necessary
- fixed issues displaying command output containing non-XML-safe data
- addition of commands to initiate a TNSR device reboot
- improved handling of unknown elements in the configuration database, so errors may be corrected in the CLI rather than by editing the configuration
In summary, we’re quite proud of this release. With a growing list of customers and prospects anxiously awaiting named features above, we’d have liked to have released sooner. But the investment in a new router plugin - and all of the regression testing required to ensure its readiness - will pay dividends as we take TNSR forward. For a comprehensive list of changes, please view the TNSR 20.08 release notes here.