View source for Core Offloads

===== Accelerated RFS =====

RSS does not maintain per-flow state. A device MAY also implement the stateful Accelerated RFS (ARFS) algorithm, which explicitly records a preferred queue for a given flow hash. If the device advertises this feature, it MUST be implemented as described in this section.

In Linux, Receive Flow Steering (RFS) is a software algorithm that steers receive processing of a packet to the CPU that last ran an application thread for the same flow. It identifies the flow that a packet belongs to by a flow hash, Optionally and preferably, that is the RSS hash received from the device.

RFS introduces a map from flow hash to CPU. When an application thread interacts with a flow, the host stores the CPU ID in <code>rfs_table[hash % rfs_table_length]</code>. When the host processes a packet from the receive queue, it looks up this table entry, queues the packet on a host queue for the given CPU and sends an inter-processor-interrupt (IPI) to trigger receive processing on the CPU affine with the application thread.

Accelerated RFS moves the RFS table to the device. This directly wakes the RFS affine CPU, skipping over RSS and IPI. The feature can be implemented with an explicit lookup table as described, or as a list of match/action rules that match on a hash or its source fields. In all cases, the action is to queue the packet on a specific queue or RSS context (see below). The host is responsible for storing a queue ID that results in interrupt processing on the same CPU as recorded at the application layer.

If ARFS is supported, regardless of implementation, the device MUST present a match/action API with match on L4 hash and queue selection action. It may offer an API that inserts and/or removes multiple rules at once.

If ARFS is enabled and an ARFS match for a hash is found, then this takes precedence over RSS. Else the device MUST fall back onto RSS.

ARFS is not suitable for all workloads. If connection churn or thread migration is high, it can introduce significant table management communication across the PCI bus.

<span id="self-learning-arfs"></span>
====== Self-learning ARFS ======

ARFS may alternatively be implemented entirely on the device. In this case the device programs the match/action table for ingress matching based on sampling of egress traffic. This requires matching a transmit queue to a receive queue and thus assumes a M:1 mapping of transmit to receive queues. Care must be taken to ensure that self-learning ARFS does not cause packet reordering within a flow.

<span id="programmable-flow-steering"></span>