Editing Core Offloads (section)

=== Segmentation Offload ===

Segmentation offload (SO) allows a host to pass the same number of bytes to the device in fewer packets. Most host transmission cost is a per-packet that is incurred as each packet traverses the software protocol stack layers. In this path, payload is not commonly accessed and thus packet size is less relevant. SO amortizes the per-packet overhead.

If a device supports SO, the host may pass it substantially larger packets than can be sent on the network. The device breaks up these SO packets into smaller packets and transmits those.

Segmentation offload depends on having checksum offload enabled, because packet checksums have to be computed after segmentation.

<span id="copy-headers-and-split-payload"></span>
===== Copy Headers and Split Payload =====

In an abstract model of segmentation offload, the device splits SO packet payload into segment sized chunks and copies the SO packet protocol headers to each segment. We refer to this basic mechanism as copy-headers-and-split-payload (CH/SP). The host communicates an unsigned integer segment size to the device along with the packet. This field must be large enough to cover the L3 MTU range: 16b is customary, but not strictly required to meet this goal. If segment size is not a divisor of total payload length, then the last packet in the segment chain will be shorter. The device MUST NOT attempt to compute or derive segment size, because establishing that is a complex process of path MTU and transport MSS discovery, more suitable to be implemented in software in the host protocol stack.

CH/SP is a simplified model. For specific protocols, segmentation offload can have subtle exceptions in how protocol header fields must be updated after copy. This spec explicitly defines all cases that diverge from pure CH/SP. The ground truth is the software segmentation implementation in Linux v6.3. If the two disagree, that source code takes precedence.

<span id="tcp-segmentation-offload"></span>
==== TCP Segmentation Offload ====

A device MUST support TCP Segmentation Offload (TSO), for both IPv4 and IPv6. It MUST be possible to enable or disable the feature. The device MUST support TSO with TCP options.

The device SHOULD support IPv4 options and IPv6 extension headers in between the IPv4 or IPv6 and TCP header. The device SHOULD support IPSec ESP and PSP transport-layer encryption headers between the IPv4 or IPv6 and TCP header. As with other fields, the device should treat these bytes as opaque and copy them unconditionally unless otherwise specified.

TCP is particularly suitable for segmentation offload because at the user interface TCP is defined as a bytestream. By this definition, the user may have no expectations of how data is segmented into packets, in contrast with datagrams or message based protocols.

TSO enables the host to send the largest possible IP packet to the device, ignoring any constraints on path maximum transmission unit (MTU) or negotiated TCP maximum segment size (MSS). The host TCP stack selects the current MSS for the TCP connection as segment size. This number may vary between connections and across a connection lifespan.

<span id="tcp-header-field-adjustments"></span>
===== TCP Header Field Adjustments =====

TSO requires protocol header changes to the TCP header after CH/SP:

* Sequence number: Sequence number of previous segment + segment size.
* Flags
** FIN, PSH are only reflected in the last segment, zero in all others
** CWR is only reflected in the first segment, zero in all others

''IP Header Field Adjustments''

IP protocols require these changes:

* IPv4 total length is updated to match the shorter payload
* IPv6 payload length is updated to match the shorter payload
* IPv4 packets must increment IP ID unless DF bit is set
* IPv4 packet checksum is recomputed

''Extension Header Field Adjustments''

Headers between the IPv4 or IPv6 header and TCP header MUST be copied as pure CH/SP.

Authenticated encryption has to happen after SO. IPSec ESP or PSP encryption headers must be copied in a pure CH/SP manner to each segment, for further processing by downstream inline encryption logic.

<span id="udp-segmentation-offload"></span>
==== UDP Segmentation Offload ====

A device SHOULD support UDP Segmentation Offload (USO), for both IPv4 and IPv6. It MUST be possible to enable or disable the feature.

USO allows sending multiple UDP datagrams in a single operation. The host passes to the device a UDP packet plus segment size field. The device splits the datagram payload on segment size boundaries and replaces the UDP header to each segment.

USO is NOT the same as UDP fragmentation offload (UFO). That sends a datagram larger than MTU size, by relying on IP fragmentation. UFO is out of scope of this spec. Unlike UFO, USO does not maintain ordering. Datagrams may arrive out of order, same as if they were sent one at a time.

The device SHOULD support IPv4 options and IPv6 extension headers in between the IPv4 or IPv6 and TCP header. The device SHOULD support IPSec ESP and PSP transport-layer encryption headers between IPv4 or IPv6 header and UDP header.

UDP forms the basis for multiple high transfer rate protocols, including HTTP/3 and QUIC, and video streaming protocols like RTP. These workloads benefit from SO and form a sizable fraction of Internet workload.

''Header Field Adjustments''

Beyond CH/SP, USO requires an update of the UDP length field for the last segment if the USO payload is not an exact multiple of the segment size. It also requires the same IP and extension header field adjustments as TCP. A device SHOULD support this. Optionally, a device MAY only support USO for packets where payload is an exact multiple of segment size. The host then has to ensure to only pass such packets to the device. This mechanism forms the basis for Protocol Independent Segmentation Offload, next.

<span id="protocol-independent-segmentation-offload"></span>
==== Protocol Independent Segmentation Offload ====

A device SHOULD support Protocol Independent Segmentation Offload (PISO), for both IPv4 and IPv6. It MUST be possible to enable or disable the feature.

PISO codifies the core CH/SP mechanism. It extends segmentation offload to transport protocols other than TCP and UDP, and to tunneling scenarios, where a stack of headers precede the inner transport layer. Many protocols can be supported purely with CH/SP.

In PISO, the host

# communicates a segment size to the device along with the large packet, as in TSO/USO.
# communicates also an inner payload offset piso_off to the device along with that packet.
# prepares any headers before piso_off as they need to appear after segmentation.

If any of the headers include a length field, PISO requires all segments to be the same size, because the host prepares the headers exactly as they appear on the wire. PISO does not adjust them.

If a payload size leaves a remainder after dividing by segment size, the host has to send two packets to the device: one PISO packet of payload length minus remainder, and a separate no-SO packet of remainder size. This is a software concern only.

<span id="interaction-with-checksum-offload"></span>
===== Interaction with Checksum Offload =====

piso_off is similar to, but separate from, checksum_start. It must be possible to configure both independently.

<span id="interaction-with-tso-and-uso"></span>
===== Interaction with TSO and USO =====

PISO can be combined with TSO and USO. Then piso_off points not to the start of the payload, but the start of the inner transport header, TCP or UDP. Then the protocol specific rules for the inner transport protocol must be respected. Any headers before piso_off must still be entirely ignored by the device and treated solely as CH/SP. The device cannot infer whether the offset points to a UDP or TCP header. Whether to apply pure PISO, PISO + TSO or PISO + USO will have to be communicated explicitly, e.g., with a field in a context descriptor.

PISO + TSO/USO can optionally be supported on some legacy devices that were not built with PISO in mind. If a device supports TSO with variable length IPv4 options or IPv6 extension headers, with an explicit descriptor field that passes the length of these extra headers, then this can be used to pass arbitrary headers for CH/SP processing (instead of only options or extension headers), including tunnels. In this case the device expects the outer IP or IPv6 header to be an SO header with a large length field, so not prepared for pure CH/SP. A driver can patch up this distinction from the PISO interface.

<span id="jumbogram-segmentation-offload"></span>
==== Jumbogram Segmentation Offload ====

The device SHOULD support IPv4 and IPv6 jumbogram SO packets that exceed the 64 KB maximum IP packet size.

IPv6 headers have a 16-bit payload length field, so the largest possible standard IPv6 packet is 64 KB + IPv6 header (payload length includes IPv6 extension headers, if any). IPv4 headers have a 16bit total length field, so the largest possible IPv4 packet is slightly smaller: 64KB including header.

Jumbogram segmentation offload ignores the IPv6 payload length and IPv4 total length fields if zero. The host must then communicate the real length of the entire packet to the device out-of-band of the packet, likely as a descriptor field.

RFC 2675 defines an IPv6 jumbo payload option, with which IPv6 packets can support up to 4GB of payload. This configuration sets the payload length field to zero and appends a hop-by-hop next header with jumbo payload option. Unlike for IPv6 jumbograms that are sent as jumbograms on the wire, it is NOT necessary for IPv6 jumbo segmentation offload to include this jumbo payload hop-by-hop next header, as the segments themselves will not be jumbograms.

<span id="receive-segment-coalescing"></span>