Editing
Core Offloads
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===== Receive Side Scaling ===== A device MUST support load balancing with flow affinity using Receive Side Scaling (RSS). This algorithm combines (a) field extraction rules for packet steering with flow affinity, (b) a hash function for uniform load balancing that incorporates a secondary hash input for DoS resistance and (c) an indirection table to optionally implement non-uniform weighted load balancing. <span id="field-extraction"></span> ====== Field Extraction ====== Queue selection must be flow affine, forwarding all packets from a transport flow to the same queue, so that packets within a flow are not reordered. Transport protocol performance can degrade when packets arrive out of order, which is likely to happen with simpler round robin packet spraying. RSS defines two rules to derive queue selection input in a flow-affine manner from packet headers. Selected fields of the headers are extracted and concatenated into a byte array. If the packet is IPv4 or IPv6, not fragmented, and followed by a transport layer protocol with ports, such as TCP and UDP, then extract the concatenated 4-field byte array { source address, destination address, source port, destination port }. Else, if the packet is IPv4 or IPv6, extract 2-field byte array { source address, destination address }. IPv4 packets are considered fragmented if the more fragments bit is set or the fragment offset field is non-zero. If a packet contains multiple IPv4 or IPv6 headers, then RSS operates on the first IPv4 or IPv6 header and the immediately following transport header, if any. The IPv6 flowlabel field may also be included. If present, this MUST be placed immediately before the source address in the byte array. The same fields from subsequent IPv4/IPv6 and transport headers MAY be appended to the byte array, if present. These extensions are optional and MUST be configurable if supported. A basic version of RSS without optional extensions MUST always be supported, to be able to perform explicit flow steering by reversing the algorithm. <span id="toeplitz-hash-function"></span> ====== Toeplitz Hash Function ====== The device MUST support the Toeplitz hash function~[ref_id:toeplitz_hash] for Receive Side Scaling~[ref_id:ms_rss]. A hash function maps the byte array onto a 32-bit number with significant entropy to serve as effective input for uniform load balancing. The Toeplitz function takes two inputs, the byte array derived from the packet P and a second byte array S of at least 40 bytes that is constant between packets. S, called the secret, is mixed into the entropy for DoS prevention. It makes queue prediction hard for a given packet unless the secret is known. Toeplitz trades off execution speed and security: it is not a cryptographically secure hash function. The secret MUST be readable and writable from the host. Explicit queue prediction has legitimate use cases, so the secret must be discoverable by trusted parties. The secret S is converted to a Toeplitz matrix Sm, a matrix in which each left-to-right descending diagonal is constant. Due to this property, a Toeplitz matrix is fully defined by its first row and column. An N * M Toeplitz matrix is defined by an N + M - 1 length source vector. S has to be long enough to match against each bit in the longest possible RSS input vector. That is 2 16B IPv6 addresses plus 2 2B ports, for 36B == 288b. A minimal RSS Toeplitz matrix is a binary Toeplitz matrix of 288 x 32 bits. 288 is one row for each bit in P. 288 + 32 - 1 == 319 bits to define the matrix establishes the minimum 40B required length of secret S. The minimum secret key length MUST be no less than 40B. Due to optional extended inputs, larger secrets MAY be supported. A range of 40-60B is common. Matching MUST always begin at bit zero, regardless of configured key length. The first row consists of the left-most 32 bits of the array. The remainder define the first bit of each subsequent row. The Toeplitz hash function performs a scalar multiplication between the Toeplitz matrix Sm and the input array P. Each bit in P<sub>i</sub> is multiplied with the 32b row Sm<sub>i</sub>. The output array O is converted to a scalar value by an XOR of all elements in O. The below reference implementation demonstrates the algorithm. Care must be taken surrounding endianness and bit-order (traverse a byte from MSB to LSB). See Appendix B for validation. <pre> uint32_t toeplitz(const unsigned char *P, const unsigned char *S) { uint32_t rxhash = 0; int bit; for (bit = 0; bit < 288; bit++) if (test_bit(P, bit)) rxhash ^= word_at_bit_offset(S, bit); return rxhash;</pre> <pre> }</pre> The device MAY support other hash functions besides Toeplitz. Then function selection must be configurable. <span id="receive-hash"></span> ====== Receive Hash ====== The computed 32b hash SHOULD be passed to the host alongside the packet. Doing so allows the host to perform additional flow steering without having to compute a hash in software, such as Linux Receive Flow Steering (RFS). A device MAY compute a 64b field to reduce collisions. It MAY communicate this instead, as long as either the 32b Toeplitz hash can be derived or can be communicated alongside. <span id="indirection-table"></span> ====== Indirection Table ====== The device MUST select a queue by reducing the hash through modulo arithmetic. It applies division to the hash value and uses the remainder as an index into a fixed number of resources. The divisor is not simply the number of receive queues. RSS specifies an additional level of indirection, the indirection table. This allows for non-uniform load balancing. The device MUST support the RSS indirection table. The device MUST lookup a queue using the following modulo operation: <pre>queue_id = rss_table[rxhash % rss_table_length];</pre> The table MUST be host-readable and writable. The host may configure the table with fewer slots than the configured number of receive queues, if the host wants to apply RSS to only a subset of queues. The host may configure the table with more slots than the number of receive queues, for more uniform load balancing. The device may limit the maximum supported table size. The minimum supported indirection table size MUST be <s>the number of supported receive queues</s>. The minimum SHOULD be at least 4 times the number of supported receive queues. The device SHOULD allow querying the maximum supported table size by the host. The device SHOULD allow replacement of the indirection table without pausing network traffic or bringing the device down, to support dynamic rebalancing, e.g., based on CPU load. <span id="accelerated-rfs"></span>
Summary:
Please note that all contributions to OpenCompute may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
OpenCompute:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information