Editing
Core Offloads
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Timestamping === The device MUST support hardware timestamping at line rate, on both ingress and egress. Timestamps MUST be taken as defined in IEEE 802.3-2022[ref_id:802.3_2022] clause 90. <span id="measurement-plane"></span> ==== Measurement Plane ==== The vendor SHOULD measure and report any constant delay between the measurement and reference plane (i.e., network), as defined there. There MUST NOT be any variable length delay between measurement and reference plane exceeding 10ns. On ingress, this implies that timestamps must be taken before any queueing. On egress, the inverse holds. In particular, measurement of a timestamp must not be subject to (PCIe) backpressure delay on communication of the transmit descriptor to the host. <span id="clock"></span> ==== Clock ==== Timestamps are measurements of a device clock. Most device clock components conform to standard requirements for stratum 3 clocks, such as G-1244-CORE[ref_id:gr_1244_core] or ITU-T G.812[ref_id:g812] type IV. The device clock SHOULD conform to one of these and report this. Before using the common terms in this domain, we first define them: * Resolution is the quantity below which two samples are seen as equal. It is defined as a time interval (e.g., nsec). The range of values that can be expressed is defined in terms of wrap-around time. From this, a minimum bit-width can be derived. Resolution itself is not an integer storage size, however. * Precision is the distribution of measurements. It indicates repeatability of measurements, and is affected by read uncertainty. Precision is also expressed as a time interval. * Accuracy is the offset from the true value. A perfectly precise measurement may have a constant offset. In this context, for instance the offset from the measurement plane from the reference plane. Clock resolution and precision MUST be 10 ns or better. The clock MUST NOT drift more than 10 ppm. This may require a temperature controlled device (TXCO, OXCO or otherwise), but implementation is not prescribed. The clock must have a wraparound no worse than the 64-bit PTPv1 format, which is 2^32 seconds or roughly 136 years. The counter MUST be monotonically non-decreasing. That is, causality must be maintained: any packet B measured after another packet A at the same measurement plane cannot have a timestamp lower than the timestamp of A. A packet passing through two measurement planes X and Y (such as PHY Tx and Rx when looping through a switch) must have a timestamp at Y greater than or equal to the timestamp at X. Timestamps may be equal in particular if transmission rate is higher than clock accuracy. <span id="clock-synchronization"></span> ==== Clock Synchronization ==== The device MUST support clock synchronization of host clock to device clock with at most 500 nsec uncertainty. Transmitting an absolute clock reading across a medium such as PCIe itself introduces variable delay that can exceed this bound. The device SHOULD bound this uncertainty, e.g., by implementing a hardware mechanism such as PCI Precision Time Measurement (PTM) [ref_id:pci_ptm]. The vendor MUST report this bound. The device must expose a clock API to read and control the NIC clock. The device MUST expose at least operations to get absolute value, set absolute value and adjust frequency. These must match the behavior of the <code>gettimex64</code>, <code>adjtime</code> and <code>adjfine</code> or <code>adjfreq</code> operations as defined in Linux <code>ptp_clock_info</code>. The get value operation MUST be implemented as a sandwich algorithm where the device clock reading is reported in between two host clock reads, as described in the PCI PTM link protocol[ref_id:pci_express_5.0, sec 6.22.2]. The frequency adjustment operation MUST allow frequency adjustments at 1 part per billion resolution or better. <span id="pps-in-and-out"></span> ===== PPS in and out ===== The device MUST support both a Pulse Per Second (PPS) input and output signal. <span id="host-communication"></span> ==== Host Communication ==== Timestamps may be passed to the host in a truncated format consisting of only the N least significant bits. This N-bit counter MUST have a wraparound of 1 second or greater. This allows the host to extend timestamps received during this interval to the full resolution by reading the full device clock at this timescale. <span id="receive"></span> ===== Receive ===== The device MAY support selective receive timestamping, where the host can install a packet filter to select a subset of packets to be timestamped. The device MUST support the option to timestamp all packets. For RSC packets the timestamp reported MUST be the timestamp of the first segment. This extends IEEE 802.3 Ethernet timestamp measurement to Receive Segment Coalescing packets. <span id="transmit"></span> ===== Transmit ===== Transmit timestamps SHOULD be passed by the device to the host in a transmit completion descriptor field. If the measurement takes place after the completion notification, the device may instead queue a separate second completion, or directly expose an MMIO timestamp register file to the host, if that design can sustain line rate measurement. The device MAY require the host to explicitly request a timestamp for each packet, e.g., through a descriptor field. For TSO packets, measurement happens after segmentation. As with all other timestamps, the timestamp MUST be taken for the first symbol in the message. This corresponds to the first segment. <span id="applications"></span> ==== Applications ==== NIC hardware timestamping is essential to IEEE 1588 clock synchronization. Applications at hyperscale also include congestion control and distributed applications. Delay based TCP congestion control takes network RTT as input signal. Measurement must be more precise than network delay, which in data centers can be tens of microseconds. Hyperscale deployment of advanced congestion control requires a significantly higher measurement rate than for PTP clock synchronization, since RTT estimates are per-connection and measurements taken on every packet. NIC hardware timestamps also enable latency measurement of the NIC datapath itself. Incast is a significant concern in hyperscale environments. Concurrent connection establishment can cause queue build up in a NIC if the host CPU, memory or peripheral bus are out of resources. Latency instrumentation can give an earlier and more informative signal than drops alone. Finally, distributed systems increasingly rely on high precision clock synchronization to offer strongly consistent scalable storage_id[sundial]. Microsoft FaRMv2_id[farm] and CockroachDB are two examples. Serializability in such databases depends on strict event ordering based on timestamps. Transactions can be committed only after a time uncertainty bound has elapsed. Key to scaling transaction rate is bounding this uncertainty. <span id="traffic-shaping"></span>
Summary:
Please note that all contributions to OpenCompute may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
OpenCompute:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information