View source for Core Offloads

==== Bitrate ====

Bitrate is the metric by which a device is often advertised, e.g., a 100 Gbps NIC.

<span id="variants"></span>
===== Variants =====

All following variants SHOULD reach the advertised line rate with

* TSO on and off
** if PISO is supported, across a UDP tunnel
* RSC on and off
* IOMMU on and off
* 1500B, 4168B and 9198B L3MTU
* Unidirectional and bidirectional traffic
* Scalability
** 10, 100, 1K, 10K flows
** 1, 10, NUM_CPU threads
** 1, 10, NUM_CPU queues

<span id="single-flow"></span>
===== Single Flow =====

Single flow MUST reach 40 Gbps with 1500B MTU and TSO. A single TCP/IP flow can reach 100 Gbps line rate when using TSO, 4KB MSS and copy avoidance_id[tcp_rx_0copy], but this is a less common setup. Single flow line rate is not a hard requirement, especially as device speeds exceed 100 Gbps.

<span id="peak-stress-and-endurance-results"></span>
===== Peak, Stress and Endurance Results =====

Short test runs can show best case numbers. Deployment requires sustained performance.

Endurance tests can expose memory leaks and rare unrecoverable edge cases, e.g., those that result in device or queue timeout. Endurance tests essentially run the same testsuite over longer periods of time. Reported numbers for 1 hour runs MUST stay constant and match short term numbers.

Stress tests test specific adverse conditions. They need not be as long as endurance tests. Performance during adverse conditions may be lower than best case, but not catastrophically so. Device and driver are expected to handle overload gracefully. They MUST be resistant to Denial of Service (DoS) and incast. If max packet rate for minimal packets is less than line rate, it SHOULD be constant regardless of packet arrival rate.

<span id="bus-contention"></span>
====== Bus Contention ======

Network traffic competes with other tasks for PCIe and memory bandwidth. Some micro-architectural considerations, such as NUMA or cache sizes and partitioning, cannot be controlled. But devices can be compared to the extent that they stress the PCIe or memory bus for the same traffic: how many PCIe messages are required to transfer the same number of packets of a given size is an indicator for real world throughput under bus contention.

This efficiency is evaluated by repeating the testsuite while running a memory antagonist. An effective memory antagonist on Unix environments is a pinned dd binary copying in-memory virtual files. A device SHOULD minimize the number of PCIe messages needed (see the section on PCIe Cache Aligned Stores) to reduce sensitivity to concurrent workloads.

<span id="packet-rate"></span>