Editing Core Offloads
Jump to navigation
Jump to search
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
<span id="ocp-server-nic-sw-specification-core-features"></span> | <span id="ocp-server-nic-sw-specification-core-features"></span> | ||
= OCP Server NIC SW Specification: Core Features = | = OCP Server NIC SW Specification: Core Features = | ||
----- | |||
<span id="draft-2023-05-13---first-public-review"></span> | |||
== DRAFT 2023-05-13 - First Public Review == | |||
<span style="background:orange;bold"> | <span style="background:orange;bold"> | ||
Automatically converted from Google Docs to Markdown. Some markup may be incorrect. | Draft version. Automatically converted from Google Docs to Markdown. Some markup may be incorrect. See the [https://docs.google.com/document/d/1FaVPGYipZ1sPhnYg7KItAS7ivL_svvZP8ZVJeFJezc0/edit?usp=sharing&resourcekey=0-CJlmlfiK_TIuZtX6WnNggg original version] in Google Docs format. | ||
[https:// | |||
</span> | </span> | ||
----- | ----- | ||
Line 109: | Line 110: | ||
=== Contact === | === Contact === | ||
This specification was created through | This specification was created through the OCP Networking project by OCP member companies Google, Intel, Meta and NVIDIA. | ||
Comments, questions, suggestions for revisions and requests to join the standard committee can be directed to the OCP Networking mailing list. See [https://www.opencompute.org/projects/networking opencompute.org/projects/networking] for details. | Comments, questions, suggestions for revisions and requests to join the standard committee can be directed to the OCP Networking mailing list. See [https://www.opencompute.org/projects/networking opencompute.org/projects/networking] for details. | ||
<span id="i-o-api"></span> | <span id="i-o-api"></span> | ||
Line 145: | Line 144: | ||
The device MUST support scatter-gather I/O. | The device MUST support scatter-gather I/O. | ||
Transmitted packets may consist of multiple discrete host memory buffers. The device MUST support a minimum of (MTU / PAGE_SIZE) | Transmitted packets may consist of multiple discrete host memory buffers. The device MUST support a minimum of (MTU / PAGE_SIZE) descriptors for MTU sized packets, rounded up to the nearest natural number, plus a separate header buffer. For packets with segmentation offload (see below), the device must support this number times the maximum number of supported segments, with an absolute minimum of 17: the minimum number of 4KB pages to span a 64KB TSO packet. Again, plus a separate header buffer. | ||
For the receive case, the host may choose to post buffers smaller than MTU to the receive queue. The device must support the same limits as for transmit queues: the absolute minimum of 2 buffers per packet and the relative minimum of (MTU / PAGE_SIZE) in the general case, and the absolute minimum of 17 and the relative minimum of N * (MTU / PAGE_SIZE) for large packets produced by Receive Segment Coalescing (RSC, below). | For the receive case, the host may choose to post buffers smaller than MTU to the receive queue. The device must support the same limits as for transmit queues: the absolute minimum of 2 buffers per packet and the relative minimum of (MTU / PAGE_SIZE) in the general case, and the absolute minimum of 17 and the relative minimum of N * (MTU / PAGE_SIZE) for large packets produced by Receive Segment Coalescing (RSC, below). | ||
Line 160: | Line 159: | ||
===== Receive Header-Split ===== | ===== Receive Header-Split ===== | ||
A device SHOULD support the special case of receive scatter-gather I/O that split headers from application layer payload. It | A device SHOULD support the special case of receive scatter-gather I/O that split headers from application layer payload. It MUST be possible to allocate header and data buffers from separate memory pools. All protocol header buffers for an entire queue SHOULD be allocated as one contiguous DMA region to minimize IOTLB pressure. On packet reception, the host operating system will copy the headers out, so the region has to allocate exactly as many headers as there are descriptors in the queue. | ||
All protocol header buffers for an entire queue | |||
Header-split allows direct data placement (DDP) of application payload into user or device memory (e.g., GPUs), while processing protocol headers in the host operating system. The operating system is responsible for ensuring that payload is not loaded into the CPU during protocol processing. Data is placed in posted buffers in the order that it arrives. Transport layer in-order delivery in the context of DDP is out of scope for this spec. | Header-split allows direct data placement (DDP) of application payload into user or device memory (e.g., GPUs), while processing protocol headers in the host operating system. The operating system is responsible for ensuring that payload is not loaded into the CPU during protocol processing. Data is placed in posted buffers in the order that it arrives. Transport layer in-order delivery in the context of DDP is out of scope for this spec. | ||
Header-split SHOULD be implemented by protocol parsing to identify the start of payload | Header-split SHOULD be implemented by protocol parsing to identify the start of payload. Protocol parsing can fail for many reasons, such as encountering an unknown protocol type. Then the device MUST allow falling back to splitting packets at a fixed offset. This offset SHOULD be host configurable. | ||
Header-split MAY be implemented with only support for a fixed offset: Fixed Offset Split (FOS). This variant does not require protocol parsing and is thus simpler to implement. Workloads often have a common default protocol layout, such as Ethernet/IPv6/TCP/TSopt. Splitting at 14 + 40 + 20 + 12 will correctly cover this modal packet length and with that the majority of packets arriving on a host. True header split is strongly preferred over FOS, and required at the advanced conformance level. If FOS is implemented, the offset MUST be host configurable. | Header-split MAY be implemented with only support for a fixed offset: Fixed Offset Split (FOS). This variant does not require protocol parsing and is thus simpler to implement. Workloads often have a common default protocol layout, such as Ethernet/IPv6/TCP/TSopt. Splitting at 14 + 40 + 20 + 12 will correctly cover this modal packet length and with that the majority of packets arriving on a host. True header split is strongly preferred over FOS, and required at the advanced conformance level. If FOS is implemented, the offset MUST be host configurable. | ||
Line 196: | Line 193: | ||
===== Count ===== | ===== Count ===== | ||
The device SHOULD also support configuring a maximum event count until an interrupt is sent. This triggers an interrupt when a configurable number of events since the last interrupt is reached. Each event corresponds to a single received or transmitted packet. For | The device SHOULD also support configuring a maximum event count until an interrupt is sent. This triggers an interrupt when a configurable number of events since the last interrupt is reached. Each event corresponds to a single received or transmitted packet. For SO packets, the COUNT should count each segment separately. When supporting a maximum event count, the device MUST support values in the range of [2, 128]. It then MUST send an interrupt when either of the two interrupt moderation conditions is met, whichever comes first. Reaching the maximum number of events immediately raises an interrupt regardless of remaining delay, so the delay constitutes an upper bound. Triggering an interrupt for either limit MUST lead to both counters being reset. | ||
<span id="tx-and-rx"></span> | <span id="tx-and-rx"></span> | ||
Line 280: | Line 277: | ||
====== Receive Hash ====== | ====== Receive Hash ====== | ||
The computed 32b hash | The computed 32b hash MAY be passed to the host alongside the packet. Doing so allows the host to perform additional flow steering without having to compute a hash in software, such as Linux Receive Flow Steering (RFS). | ||
<span id="indirection-table"></span> | <span id="indirection-table"></span> | ||
====== Indirection Table ====== | ====== Indirection Table ====== | ||
The device MUST select a queue by reducing the hash through modulo arithmetic. It applies division to the hash value and uses the remainder as an index into a fixed number of resources. The divisor is not simply the number of receive queues. RSS specifies an additional level of indirection, the indirection table. This allows for non-uniform load balancing. The device MUST support the RSS indirection table. The device MUST | The device MUST select a queue by reducing the hash through modulo arithmetic. It applies division to the hash value and uses the remainder as an index into a fixed number of resources. The divisor is not simply the number of receive queues. RSS specifies an additional level of indirection, the indirection table. This allows for non-uniform load balancing. The device MUST support the RSS indirection table. The device MUST look a queue using the following modulo operation: | ||
<pre>queue_id = rss_table[rxhash % rss_table_length];</pre> | <pre>queue_id = rss_table[rxhash % rss_table_length];</pre> | ||
The table MUST be host-readable and writable. The host may configure the table with fewer slots than the configured number of receive queues, if the host wants to apply RSS to only a subset of queues. The host may configure the table with more slots than the number of receive queues, for more uniform load balancing. The device may limit the maximum supported table size. The minimum supported indirection table size MUST be | The table MUST be host-readable and writable. The host may configure the table with fewer slots than the configured number of receive queues, if the host wants to apply RSS to only a subset of queues. The host may configure the table with more slots than the number of receive queues, for more uniform load balancing. The device may limit the maximum supported table size. The minimum supported indirection table size MUST be at least the number of supported receive queues. The minimum SHOULD be at least 4 times the number of supported receive queues. The device SHOULD allow querying the maximum supported table size by the host. The device SHOULD allow replacement of the indirection table without pausing network traffic or bringing the device down, to support dynamic rebalancing, e.g., based on CPU load. | ||
<span id="accelerated-rfs"></span> | <span id="accelerated-rfs"></span> | ||
Line 351: | Line 346: | ||
The device MUST implement equal weight deficit round robin (DRR) as default dequeue algorithm [ref_id_fq_drr]. Deficit round robin is a per-byte algorithm. Time is divided in rounds. Each queue earns a constant number of byte credits during each round, its quantum. The device services queues in a round robin order. If a queue has data outstanding when it is scanned, all packets that add up to less than the queue’s quantum are sent and the credit is reduced accordingly. If one or more packets cannot be sent because the packet at the head of the queue is longer than the remaining quantum, then the remaining quantum carries over to the next round. If the queue is empty at the end of a round, the remaining quantum is reset to zero. | The device MUST implement equal weight deficit round robin (DRR) as default dequeue algorithm [ref_id_fq_drr]. Deficit round robin is a per-byte algorithm. Time is divided in rounds. Each queue earns a constant number of byte credits during each round, its quantum. The device services queues in a round robin order. If a queue has data outstanding when it is scanned, all packets that add up to less than the queue’s quantum are sent and the credit is reduced accordingly. If one or more packets cannot be sent because the packet at the head of the queue is longer than the remaining quantum, then the remaining quantum carries over to the next round. If the queue is empty at the end of a round, the remaining quantum is reset to zero. | ||
The device SHOULD also support DRR with non-equal weights. Then it MUST support host configuration of the weights | The device SHOULD also support DRR with non-equal weights. Then it MUST support host configuration of the weights. | ||
The device MAY offer additional algorithms. If strict priority is supported, it SHOULD implement this mode with starvation prevention. | The device MAY offer additional algorithms. If strict priority is supported, it SHOULD implement this mode with starvation prevention. | ||
Line 381: | Line 376: | ||
Devices with a programmable hardware parser allow the administrator to push firmware updates to support custom protocols. A programmable parser is still strictly less desirable than protocol independent offloads, as programmable parsers introduce correlated roll-outs between software and firmware. At hyperscale, correlated roll-outs and potential roll-backs add significant complexity and risk. | Devices with a programmable hardware parser allow the administrator to push firmware updates to support custom protocols. A programmable parser is still strictly less desirable than protocol independent offloads, as programmable parsers introduce correlated roll-outs between software and firmware. At hyperscale, correlated roll-outs and potential roll-backs add significant complexity and risk. | ||
<span id="checksum-offload"></span> | <span id="checksum-offload"></span> | ||
Line 419: | Line 412: | ||
A device MUST be able to verify ones’ complement checksums. The device SHOULD implement the feature in a protocol independent manner. | A device MUST be able to verify ones’ complement checksums. The device SHOULD implement the feature in a protocol independent manner. | ||
Protocol independent linear ones’ | Protocol independent linear ones’ colement (PILOC) receive checksum offload computes the ones’ complement sum over the entire packet exactly as passed by the driver to the host, for every packet, excluding only the 14B Ethernet header. The sum MUST exclude the Ethernet header. It MUST include all headers after this header, including VLAN tags if present. It MUST exclude all fields not passed to the host, such as possible crypto protocol MAC footers. | ||
It MUST be possible for the host to independently verify checksum correctness by computing the same sum in software. This is impossible if the checksum includes bytes removed by the device, such as an Ethernet FCS. | It MUST be possible for the host to independently verify checksum correctness by computing the same sum in software. This is impossible if the checksum includes bytes removed by the device, such as an Ethernet FCS. | ||
Legacy devices MAY instead return only a boolean value with the packet that signals whether a checksum was successfully verified. This approach is strongly discouraged. If this approach is chosen, then the device MUST checksum only the outermost TCP | Legacy devices MAY instead return only a boolean value with the packet that signals whether a checksum was successfully verified. This approach is strongly discouraged. If this approach is chosen, then the device MUST checksum only the outermost UDP or TCP checksum (if it verifies a checksum at all) and MUST return true only if this checksum can be verified. The device SHOULD then compute the sum over the pseudo-header, L4 header and payload, including the checksum field, and verify that this sums up to zero. Note that both negative and positive zero MUST be interpreted as valid sums, for all protocols except UDP. Only for UDP does the all-zeroes checksum 0x0000 indicate that the checksum should not be verified. An implementation returning a PILOC sum does not require extra logic to address these protocol variations. | ||
The device MUST pass all packets to the host, including those that appear to fail checksum verification. The host must be able to account, verify and report such packets. | The device MUST pass all packets to the host, including those that appear to fail checksum verification. The host must be able to account, verify and report such packets. | ||
Line 444: | Line 437: | ||
===== Copy Headers and Split Payload ===== | ===== Copy Headers and Split Payload ===== | ||
In an abstract model of segmentation offload, the device splits SO packet payload into segment sized chunks and copies the SO packet protocol headers to each segment. We refer to this basic mechanism as copy-headers-and-split-payload (CH/SP). The host communicates | In an abstract model of segmentation offload, the device splits SO packet payload into segment sized chunks and copies the SO packet protocol headers to each segment. We refer to this basic mechanism as copy-headers-and-split-payload (CH/SP). The host communicates a 16-bit unsigned integer segment size to the device along with the packet. If segment size is not a divisor of total payload length, then the last packet in the segment chain will be shorter. The device MUST NOT attempt to compute or derive segment size, because establishing that is a complex process of path MTU and transport MSS discovery, more suitable to be implemented in software in the host protocol stack. | ||
CH/SP is a simplified model. For specific protocols, segmentation offload can have subtle exceptions in how protocol header fields must be updated after copy. This spec explicitly defines all cases that diverge from pure CH/SP. The ground truth is the software segmentation implementation in Linux v6.3. If the two disagree, that source code takes precedence. | CH/SP is a simplified model. For specific protocols, segmentation offload can have subtle exceptions in how protocol header fields must be updated after copy. This spec explicitly defines all cases that diverge from pure CH/SP. The ground truth is the software segmentation implementation in Linux v6.3. If the two disagree, that source code takes precedence. | ||
Line 453: | Line 446: | ||
A device MUST support TCP Segmentation Offload (TSO), for both IPv4 and IPv6. It MUST be possible to enable or disable the feature. The device MUST support TSO with TCP options. | A device MUST support TCP Segmentation Offload (TSO), for both IPv4 and IPv6. It MUST be possible to enable or disable the feature. The device MUST support TSO with TCP options. | ||
The device SHOULD support IPv4 options and IPv6 extension headers in between the IPv4 or IPv6 and TCP header. The device SHOULD support IPSec ESP and PSP transport-layer encryption headers between the IPv4 or IPv6 and TCP header | The device SHOULD support IPv4 options and IPv6 extension headers in between the IPv4 or IPv6 and TCP header. The device SHOULD support IPSec ESP and PSP transport-layer encryption headers between the IPv4 or IPv6 and TCP header. | ||
TCP is particularly suitable for segmentation offload because at the user interface TCP is defined as a bytestream. By this definition, the user may have no expectations of how data is segmented into packets, in contrast with datagrams or message based protocols. | TCP is particularly suitable for segmentation offload because at the user interface TCP is defined as a bytestream. By this definition, the user may have no expectations of how data is segmented into packets, in contrast with datagrams or message based protocols. | ||
Line 488: | Line 481: | ||
A device SHOULD support UDP Segmentation Offload (USO), for both IPv4 and IPv6. It MUST be possible to enable or disable the feature. | A device SHOULD support UDP Segmentation Offload (USO), for both IPv4 and IPv6. It MUST be possible to enable or disable the feature. | ||
The device SHOULD support IPv4 options and IPv6 extension headers in between the IPv4 or IPv6 and TCP header. The device SHOULD support IPSec ESP and PSP transport-layer encryption headers between IPv4 or IPv6 header and UDP header. | The device SHOULD support IPv4 options and IPv6 extension headers in between the IPv4 or IPv6 and TCP header. The device SHOULD support IPSec ESP and PSP transport-layer encryption headers between IPv4 or IPv6 header and UDP header. | ||
Line 533: | Line 522: | ||
==== Jumbogram Segmentation Offload ==== | ==== Jumbogram Segmentation Offload ==== | ||
The device SHOULD support | The device SHOULD support IPv6 jumbogram SO packets that exceed the 64 KB maximum IP packet size. | ||
IPv6 headers have a 16-bit payload length field, so the largest possible standard IPv6 packet is 64 KB + IPv6 header (payload length includes IPv6 extension headers, if any). | |||
IPv6 | RFC 2675 defines an IPv6 jumbo payload option, with which IPv6 packets can support up to 4GB of payload. This configuration sets the payload length field to zero and appends a hop-by-hop next header with jumbo payload option. | ||
Jumbogram segmentation offload ignores the IPv6 payload length | Jumbogram segmentation offload ignores the IPv6 payload length field if zero. The host must then communicate the real length of the entire packet to the device out-of-band of the packet, likely as a descriptor field.The device can use the established TSO, USO and PISO rules to derive the total payload length from the total packet length. | ||
Unlike for IPv6 jumbograms that are sent as jumbograms on the wire, it is not necessary for IPv6 jumbo segmentation offload to include a jumbo payload hop-by-hop next header, if the segments themselves will not be jumbograms. | |||
<span id="receive-segment-coalescing"></span> | <span id="receive-segment-coalescing"></span> | ||
Line 558: | Line 549: | ||
===== Segment size ===== | ===== Segment size ===== | ||
The device MUST pass to the host along with the large (SO) packet, a segment size field that encodes the payload length of the original packets. This field implies that packets are only coalesced if they have the same size on the wire. Coalescing stops if a packet arrives of different size. If it is larger than the previous packets, it cannot be appended. If it is smaller, it can be. If segment size is not a divisor of the SO packet payload, then | The device MUST pass to the host along with the large (SO) packet, a segment size field that encodes the payload length of the original packets. This field implies that packets are only coalesced if they have the same size on the wire. Coalescing stops if a packet arrives of different size. If it is larger than the previous packets, it cannot be appended. If it is smaller, it can be. If segment size is not a divisor of the SO packet payload, then theremainder encodes the payload length of this last packet. | ||
''Reversibility'' | ''Reversibility'' | ||
Line 630: | Line 621: | ||
* IPv4 total length is updated to match the SO packet. | * IPv4 total length is updated to match the SO packet. | ||
* IPv6 payload length is updated to match the SO packet. | * IPv6 payload length is updated to match the SO packet. | ||
* IPv4 IP ID is the ID of the first segment. | * IPv4 IP ID is the ID of the first segment. | ||
* IPv4 checksum is valid. | * IPv4 checksum is valid. | ||
<span id="timestamping"></span> | <span id="timestamping"></span> | ||
Line 722: | Line 704: | ||
==== Ingress ==== | ==== Ingress ==== | ||
An ingress queue can build up on the device due to incast. If a standing queue can build up in the device, the device SHOULD mitigate head of line blocking of high priority traffic, by prioritizing traffic based on IP DSCP bits. The device MUST offer at least two traffic bands and MUST support host configurable mapping of DSCP bits to band. The device SHOULD offer weighted round robin (WRR) dequeue with weights configurable by the host. It may implement strict priority. If so, this MUST include starvation prevention with a | An ingress queue can build up on the device due to incast. If a standing queue can build up in the device, the device SHOULD mitigate head of line blocking of high priority traffic, by prioritizing traffic based on IP DSCP bits. The device MUST offer at least two traffic bands and MUST support host configurable mapping of DSCP bits to band. The device SHOULD offer weighted round robin (WRR) dequeue with weights configurable by the host. It may implement strict priority. If so, this MUST include starvation prevention with a minimumof 10% of bandwidth for every queue. | ||
<span id="egress"></span> | <span id="egress"></span> | ||
Line 738: | Line 720: | ||
This feature relies on comparing packet departure time against a device clock. It thus depends on a device hardware clock and host clock synchronization as described in the section on timestamping. It requires a transmit descriptor field to encode the departure time. | This feature relies on comparing packet departure time against a device clock. It thus depends on a device hardware clock and host clock synchronization as described in the section on timestamping. It requires a transmit descriptor field to encode the departure time. | ||
If the device supports EDT, then it MUST implement this according to the following rules. It MUST send without delay packets which have no departure time set or for which the departure time is in the past. It MUST NOT send a packet with a departure time before that departure time under any conditions. Departure time resolution MUST be 2us or smaller. The device MUST be able to accept and queue packets with a departure time up to 50 msec in the future. This “time horizon” is based on congestion control algorithms’ forward looking window. The device likely also has a global maximum storage capacity | If the device supports EDT, then it MUST implement this according to the following rules. It MUST send without delay packets which have no departure time set or for which the departure time is in the past. It MUST NOT send a packet with a departure time before that departure time under any conditions. Departure time resolution MUST be 2us or smaller. The device MUST be able to accept and queue packets with a departure time up to 50 msec in the future. This “time horizon” is based on congestion control algorithms’ forward looking window. The device likely also has a global maximum storage capacity. It SHOULD NOT have a maximum per interval capacity. The vendor MUST report all such bounds. The device MAY support a special slot for queueing packets with a time beyond the time horizon, or it may choose to drop those. The device MUST expose a counter for all packets dropped by the timing wheel due to either resource exhaustion or departure time beyond the horizon. The device SHOULD signal in a transmit completion when a packet was dropped rather than sent. | ||
<span id="protocol-support"></span> | <span id="protocol-support"></span> | ||
Line 940: | Line 922: | ||
===== Single Flow ===== | ===== Single Flow ===== | ||
Single flow MUST reach 40 Gbps with 1500B MTU and TSO. A single TCP/IP flow can reach 100 Gbps line rate when using TSO, 4KB MSS and copy avoidance_id[tcp_rx_0copy], but this is a less common setup. Single flow line rate is not a hard requirement, | Single flow MUST reach 40 Gbps with 1500B MTU and TSO. A single TCP/IP flow can reach 100 Gbps line rate when using TSO, 4KB MSS and copy avoidance_id[tcp_rx_0copy], but this is a less common setup. Single flow line rate is not a hard requirement, especiallas device speeds exceed 100 Gbps. | ||
<span id="peak-stress-and-endurance-results"></span> | <span id="peak-stress-and-endurance-results"></span> | ||
Line 967: | Line 949: | ||
The vendor MUST report maximum packet rate BOTH with a chosen optimal configuration and with a single pair of receive and transmit queues. | The vendor MUST report maximum packet rate BOTH with a chosen optimal configuration and with a single pair of receive and transmit queues. | ||
The performance metrics should remain reasonably constant with queue count: packet rate at any number of queues 8 | The performance metrics should remain reasonably constant with queue count: packet rate at any number of queues above 8 SHOULD be no worse than 80% of the best case packet rate. If this cannot be met, the vendor MUST also report the worst case queue configuration and its packet rate. This to avoid surprises as the user deploys the device and tunes configuration. | ||
<span id="latency"></span> | <span id="latency"></span> | ||
Line 1,569: | Line 1,544: | ||
<td> | <td> | ||
[2, 50] us | [2, 50] us | ||
</td> | </td> | ||
</tr> | </tr> | ||
Line 1,842: | Line 1,805: | ||
<td> | <td> | ||
max pps / 8 | max pps / 8 | ||
</td> | </td> | ||
</tr> | </tr> | ||
Line 2,079: | Line 2,016: | ||
<pre>For BYTE in {1..64}: | <pre>For BYTE in {1..64}: | ||
ip link add link eth0 dev eth0.$BYTE address 22: | ip link add link eth0 dev eth0.$BYTE address 22::22:22:22:$BYTE type macvlan</pre> | ||
The device must support promiscuous (all addresses) and allmulti (all multicast addresses) modes: | The device must support promiscuous (all addresses) and allmulti (all multicast addresses) modes: | ||
Line 2,186: | Line 2,123: | ||
* Disable CPU sleep states (C-states), frequency scaling (P-states) and turbo modes. | * Disable CPU sleep states (C-states), frequency scaling (P-states) and turbo modes. | ||
* Disable hyperthreading | * Disable hyperthreading | ||
* Disable IOMMU | * Disable IOMMU* Pin process threads | ||
* Pin process threads | |||
* Memory distance: pin threads and IRQ handlers to the same NUMA node or cache partition | * Memory distance: pin threads and IRQ handlers to the same NUMA node or cache partition | ||
** Select the NUMA node to which the NIC is connected | ** Select the NUMA node to which the NIC is connected | ||
Line 2,240: | Line 2,176: | ||
A pure Linux solution for packet processing can be built using eXpress Data Path (XDP). Packets must be generated on the host as close to the device as possible. A device that supports AF_XDP, in native driver mode, with copy avoidance and busy polling, has been shown to reach 30 Mpps on a 40 Gbps NIC using the rx_drop benchmark that ships with the Linux kernel. Over 100 Mpps has been demonstrated on 100 Gbps NICs, but these results are not publicly published. | A pure Linux solution for packet processing can be built using eXpress Data Path (XDP). Packets must be generated on the host as close to the device as possible. A device that supports AF_XDP, in native driver mode, with copy avoidance and busy polling, has been shown to reach 30 Mpps on a 40 Gbps NIC using the rx_drop benchmark that ships with the Linux kernel. Over 100 Mpps has been demonstrated on 100 Gbps NICs, but these results are not publicly published. | ||
<span id="latency-1"></span> | <span id="latency-1"></span> | ||
Line 2,284: | Line 2,210: | ||
<td> | <td> | ||
Initial public draft | Initial public draft | ||
</td> | </td> | ||
</tr> | </tr> |