Kernel-Based Packet Processing
The prevailing packet processing model for decades has been ‘kernel-based’. For every network device that receives, inspects, and subsequently sends the packet to its next hop (from your mobile device, desktop device, security camera, home theater system, etc. to the application server in a private data center or cloud somewhere, and back again) that packet is received on a network interface, and sent straight into the computer’s operating system (OS) - in fact, all the way to the core of the OS (the kernel) - for determination on how it should be processed within that device.
Now, the kernel is the crown jewel of the OS. It manages the operation of the computer and its most important hardware - notably memory and CPU time. The kernel is also small, (relatively speaking). It is delicate. And, it is very, very busy when many computer processes request its attention.
So kernel processing of packets is designed around the principle of receiving one packet at a time, fetching an instruction from an instruction cache, performing that instruction on the packet, fetching the next instruction, performing that instruction, so on and so forth. Then that packet is sent on its merry way, and the second packet enters and goes through the same routine.
The FD.io analogy for explaining this is a good one, so we’ll stick with it. Consider the problem of a stack of lumber where each piece of lumber needs to be cut, sanded, and have holes drilled in it. There are two ways of doing the job. Cut, sand, and drill each board one at a time. Or, cut all boards, then sand all boards, then drill all boards. The second approach will save loads of time as you’ll avoid changing tools with each process step on each board.
Kernel-based processing is the former approach. On robust CPUs, e.g., Intel® Xeon® class processors, packet forwarding with stock Linux tops out at 2 million packets per second (Mpps) - and can easily be stymied by intracore locking and other effects. With experimental technologies, Linux has been shown to make some gains in artificial benchmarks, such as dropping all received packets, but a lot of work is still required, and VPP is available today.
Now, if one of the above-mentioned devices has a 10 Gbps interface, how will you process packets fast enough to fill that pipe? 10 Gbps line rate processing of the smallest (most CPU intensive) packets we have to deal with (64-byte packets, which is 84 bytes on the wire) is equivalent to 14.88 Mpps. Multiple Linux systems strapped together with a load balancer will consume a lot of cost, space, heat, etc. for a single 10 Gbps link - so you see where this is going. Extravagant CapEx and OpEx will be required. Alternatively, you could opt for an expensive, vendor-proprietary application-specific integrated circuits (ASIC) or field programmable gate array (FPGA) solution. Well, that won’t be cheap - and you’ll also just have begun your subscription to ‘vendor lock-in’.