Netgate Blog

The Behemoth Router is Here

A little over a year ago, I wrote a blog that explained the coming world of high-performance, flexible software-based routers. Seriously disruptive packet processing performance that could change the landscape of networking applications across data centers, the enterprise WAN, service provider networks, and customer premises deployments. No need to recount that here, the principles of the post haven’t changed.

Since that time, the company has been hard at work productizing the underlying technology discussed in that blog, and now it’s here. Netgate’s TNSR is now available on Amazon’s AWS Marketplace.

TNSR is an advanced open source-based firewall, router, and VPN platform with breakthrough enterprise-class performance, management, and service expansion flexibility. You can learn more about it here.

But, perhaps the best way to give our readership a quick understanding of its disruptive power is to consider four use cases.

The Telecommuter Use Case

Suppose you are running pfSense on an SG-1000, which has a single core, TI AM3352 ARM Cortex-A8 600 MHz CPU, 512 MB of DDR3 RAM, and two 1 Gigabit Ethernet ports. Realistically, and due to overheads inherent in kernel-based processing and the ‘pf’ packet with ALTQ, you can get a bit over 100 Mbps through this product using large packets. Let’s be generous and call it 200 Mbps. As a benchmark, that’s 0.2 Gbps per core. If you upgrade your internet connection to 300 Mbps from Spectrum, or 1 Gbps from Google Fiber, your firewall/router becomes the bottleneck. You simply won’t be able to use the extra bandwidth. But with TNSR, saturating the full 1 Gbps throughput of an SG-1000 class product will be a walk in the park, even with IMIX traffic (more on that below). Corporate telecommuters can now have ‘in-office’ local area network performance, without the commute into downtown.

The Small to Medium Business Premises Use Case

Consider pfSense on a high-end Netgate appliance, such as the XG-1541. This product provides an 8 core Intel® Xeon® D-1541 (Broadwell) CPU, 16GB of DDR4 RAM, two 1 Gigabit Ethernet ports and two 10 Gigabit Ethernet ports. Our testing reveals pfSense can manage about 3.4 Gbps when processing 64 byte packets, and about 9.1 Gbps when processing 1400 byte packets. Why does this matter?

Packet processing requires header inspection for each packet. If the packet payload is small, a given network connection filled to the brim will have to process far more packets than if the payload is large. Large payloads - 1400 or 1500 bytes in size - are fine for things like file transfer, movie downloads, etc. But application traffic - which requires significantly more client-server type interaction like VoIP (SIP), DNS, gaming, messaging apps, etc. - is dominated by small packets. For consumers, large packets are probably the predominate traffic type, and latency won’t matter much. But for business users at a branch office, or gaming at home, latency can range from annoying to debilitating. As latency increases, frustration goes up and worker productivity comes down.

Now, this is not a one extreme or the other discussion. For over a decade, the preferred benchmark for network performance testing has been IMIX - which requires that packet processing handle a selection of packet sizes in order to simulate real-world conditions. But, let’s be generous again and ignore that real-world business application traffic is going to tend more towards smaller-sized packets (let alone any encryption handling), which significantly impedes throughput. So, for simplicity, we’ll base our story on 1400 byte packets. pfSense on an XG-1541 benchmarks to about 1 Gbps per core - a 5X gain over an SG-1000. A worthy jump for sure. Yet, you are still not using the full power of the product you purchased. Remember, it has 10 Gigabit Ethernet ports. So if you are hauling business level traffic, you are throttled to 3-5 Gbps. With TNSR, the entire 10 Gbps throughput is ‘there for the taking’.

The Cloud Use Case

You’ve been running pfSense for some time now - happy that you could extend the same great firewall protection you’ve used for years on premises right to your cloud-based workloads. Unfortunately, pfSense still imposes the same constraints - due to its inherent kernel processing - in the cloud that we just described for our telecommuter and small business users above.

Enter TNSR. The least capable Amazon AWS EC2 instance currently available for TNSR to be provisioned is the C5.xlarge - underpinned by a 4 Core 3.0 GHz Intel® Xeon® CPU and 8 GB of RAM. Our testing shows we can get 4.79 Gbps out of a single core. Now here’s an important point. That’s 4.79 Gbps irrespective of packet size. We see the same throughout whether it’s 64 byte packets, or even 1500 byte packets. So TNSR pushes roughly 5 Gbps per core - a 5x lift over pfSense on an XG-1541.

But hang on. In Netgate’s testing, we’ve routinely seen TNSR move 14 Mpps, or 10 Gbps - per core - at small packet sizes. So why the drop? Because Amazon throttles (governs) bandwidth for their EC2 instances - presumably to prevent noisy neighbors from consuming a disproportionate percentage of shared infrastructure. Someday that 5 Gbps throughput governor may be relaxed, but that is out of our control. Just be advised, TNSR is being commercially held back from its real potential. pfSense can’t overrun the underlying EC2 instance hardware, but TNSR can make it redline. And when that happens, your price per packet processed in the cloud drops precipitously.

Let’s kick it up a notch or two, or four (with AWS bandwidth governors as they are). Suppose we use all four C5.xlarge cores. That EC2 instance (which, by the way, is available for pennies per hour) can now pump right at 20 Gbps over four connections. And, since all of our provisioned instances support Amazon’s Elastic Network Adapter (ENA) - which affords a 25 Gbps network connection - we now have a 20 Gbps non-blocking software router for a few dimes per hour.

The Service Provider Use Case

Service providers demand serious infrastructure that reliably provides conformance with high bandwidth SLAs while providing service to a large number of customers. They are under constant pressure to increase price performance for their buyers, as well as reduce their own costs to survive in a ‘dog eat dog’ world. Six figure routers and firewalls don’t exactly get them to the promised land. They see really powerful white box appliances on the market at ultra low prices and lament, “If only I had packet processing software that could fully tap that horsepower.” Let’s take a peek into the near future.

The new Intel® Xeon® Platinum 8168 CPU has 28 cores. Recent testing by FD.io using 4 of these CPUs revealed that Vector Packet Processing - the fast data plane in TNSR - can process 948 Gbps at 400 Mpps across 30 cores. That’s 14 Mpps per core, and the test scaled with core count until PCIe bandwidth was saturated. Restated, on modern CPUs, the packet processing in TNSR is limited by PCIe bandwidth, not CPU. Using an IMIX benchmark, the packet processing in TNSR scales out on these CPUs at and incredible 50 Gbps per core. These are numbers that service providers can get excited about.

The behemoth router has arrived. But what it really means is secure networking functions (routing, firewalls, VPNs, and more) are fast approaching a utility model - just like power or water. Use as much as you want, when you want, where you want. Accessible by all - from SOHO to Service Provider. These are truly exciting times. We’ll be talking more in the coming months about similar scale advances in management and services expansion - two other areas where TNSR shatters historic barriers. Stay tuned.