Establishing a Collaboration Framework to Fine-Tune Ethernet for AI and HPC
Today, the Open Compute Project Foundation (OCP), the nonprofit organization bringing hyperscale innovations to all, and Ultra Ethernet Consortium (UEC) announce a new collaboration to improve Ethernet performance for next-generation AI clusters and HPC. UEC plans to develop enhancements to Ethernet, and the OCP has an active Community developing sustainable large-scale computational infrastructure for AI and HPC with Ethernet. Together, the OCP and UEC expect to collaborate on the integration of enhanced Ethernet for the next generation of OCP Community-delivered AI clusters providing the low latency connectivity needed for backend AI cluster fabrics.
“The uptake for AI to assist humans in the workplace is expected to be widespread, including automating data center operations, and has created a data center IT equipment investment perfect storm. Capex spend forecasts have been adjusted upwards by major analyst firms, driven by the planned deployments of hyperscale data center operators of large-scale AI clusters. By collaborating, the UEC and the OCP Community system specifications can have a greater influence on the development of sustainable, large-scale AI clusters addressing key memory size, and connectivity bandwidth challenges posed by large language models," said George Tchaparian, CEO at the Open Compute Project Foundation.
Key aspects of the collaboration will involve aligning work within OCP and UEC, focusing efforts on shared objectives and ensuring OCP's integration of UEC's Ethernet enhancements is smooth and effective. Overall, the alliance will leverage the expertise of both organizations to advance Ethernet performance for AI workloads. Areas that have been identified for initial exploration on potential collaborations include: OCP Switch Abstraction Interface (SAI), OCP Caliptra Workstream, OCP Networking Project, OCP NIC Workstream, OCP Time Appliance Project, and OCP Future Technologies Initiative.
“AI and HPC workloads present new network challenges such as the need for greater scale, more bandwidth, multi-pathing and faster reaction to congestion. UEC was formed to develop specifications for Ethernet that will bring improvements to better meet the requirements of these workloads as they are experiencing tremendous growth which will change the nature of network traffic, potentially as significant as the switch from voice to video. Collaborating with the OCP Community will fast-track the integration of the UEC-inspired Ethernet enhancements into complete systems that can serve the market”, said J Metz, Steering Committee Chair at the Ultra Ethernet Consortium.
“We are seeing rapidly increasing interest in the rollout of Gen AI workloads and AI-assisted applications throughout the industry. This will create challenges for cloud and data center networking in terms of being able to provide the interconnect bandwidth needed for training and inferencing requirements within these Gen AI clusters. The partnership between the OCP and UEC creates a collaborative community involving both organizations that can develop required Ethernet enhancements and work to embed them in systems that improve AI cluster performance. Taking a fresh look at Ethernet in the context of Gen AI workloads and deployment of large-scale AI clusters has the potential to push the industry forward,” said Rohit Mehra, Group Vice President, Network and Telecommunications at IDC.
About the Open Compute Project Foundation
The Open Compute Project (OCP) is a collaborative Community of hyperscale data center operators, telecom, colocation providers and enterprise IT users, working with the product and solution vendor ecosystem to develop open innovations deployable from the cloud to the edge. The OCP Foundation is responsible for fostering and serving the OCP Community to meet the market and shape the future, taking hyperscale-led innovations to everyone. Meeting the market is accomplished through addressing challenging market obstacles with open specifications, designs and emerging market programs that showcase OCP-recognized IT equipment and data center facility best practices. Shaping the future includes investing in strategic initiatives and programs that prepare the IT ecosystem for major technology changes, such as AI & ML, optics, advanced cooling techniques, composable memory and silicon. OCP Community-developed open innovations strive to benefit all, optimized through the lens of impact, efficiency, scale and sustainability. Learn more at www.opencompute.org.
About the Ultra Ethernet Consortium
The Ultra Ethernet Consortium (UEC), part of the Linux Foundation's Joint Development Foundation, is redefining Ethernet for the demands of Artificial Intelligence (AI) and High-Performance Computing (HPC). UEC brings together companies for industry-wide cooperation to develop Ethernet specifications and source code that empower AI and HPC environments with next-level performance, scalability, and interoperability. UEC is paving the way for seamless data exchange and computation in the digital realm. Learn more at: ultraethernet.org
Dirk Van Slyke
Open Compute Project Foundation
Ultra Ethernet Consortium
The Linux Foundation