Open Compute Project: 2020 Area of Focus Roadmap

Open Compute Project: 2020 Area of Focus Roadmap

 

The Open Compute Project (OCP) Foundation staff is often asked about our roadmap. This blog addresses that question and  provides better insight into the areas of focus. As most know, OCP is an open source community and all technical content comes from its membership. Part of our community consists of engineers, developers and builders. And another part consists of constituents that consume the hardware and solutions developed out of OCP.  The Foundation has started work to identify and prioritize technologies and projects intending to solve the pain points of the industry and to further engage with our community of engineers, developers and builders. Beginning last year, the board of directors, technical advisors (e.g. Incubation Committee)  and volunteer leadership identified more than 15 opportunities where an open source community can accelerate solutions for key verticals such as hyperscale, large enterprises, telco and edge. This has become our Technologies Planning Map, and a northstar for partnership, collaboration, project creation and investment. Last month, with new technical leaders on board, we reconfirmed our top priorities. They are:    

 

 

 

Common Platforms

 

The cloud industry has been on a transformation journey for a decade now. The disaggregation of switch gear and server hardware has enabled the adoption of open source and whitebox hardware solutions. We have seen tremendous breakthroughs in efficiency and scale from the larger infrastructure builders who are innovating in the industry. The transformation has also highlighted a need for “operationally similar” compute platforms where the physical, logical and programming interfaces remain consistent, while supporting the latest technologies.For example, NUMA (or non-NUMA) memory or cache coherency should be consistent for multiple technology generations. A similar example would be common peripheral interfaces and physical form factors.  

 

A common platform approach provides scalable, efficient designs for all adopters, and is especially well-suited for large enterprise and telecom operators (those providing private cloud services) that benefit from leveraging the technology and products developed by the world’s biggest service providers. The common platform concept would ideally support modularity, standardization of hardware management and security, and open system firmware.

 

In 2019, the Open Accelerator Infrastructure (OAI) project is defining and enabling common building blocks that deliver scalable AI hardware for learning and inference. The Edge project, also launched in 2019, has defined and enabled common enclosures and sleds for latency sensitive applications deployed outside of the data center.   

 

Looking ahead, a common platform should also define manageability, security and system initialization, and should embrace emerging interfaces that extend the useful life of the base design.   Besides OCP, there are several organizations working to define and enable new interfaces that would add value to a common platform. Whenever possible, OCP will partner with these organizations to collaborate, define, promote and enable the interfaces. The Foundation staff will also explore opportunities to co-host events and launch new projects within the Community.         

 

 

Modularity.

 

The bulk of server demand will continue to be general-purpose devices and while GPU and AI-oriented workloads represent high growth rate in the data center, they are not deployed in every server. Much is written that Moore’s Law has come to an end due to limitations of IC process feature size and clock speeds. Still, performance improvements from packaging and instruction set, as well as public cloud availability, will deliver the requisite performance for general purpose workloads. But for certain workloads, domain specific architectures and deep learning architectures are delivering 10x to 1000x performance improvement over general purpose processors or software solutions. 

 

Modularity and standardization of interfaces protects HW and SW investments, allows late configuration decisions and accelerates adoption and deployment of the latest and best hardware technology. These hardware solutions, such as accelerators, should be implemented as modules, allowing a common platform to provide host services (electrical interconnect, mechanical, cooling, power, protocol).  

 

The best example of the value of modularity is the OCP Mezzanine card. Beginning in 2012 with the original specification contributed by Facebook, the OCP Community has embraced modularity and today it’s the defacto standard for IO options. In 2019, more than a dozen companies co-authored the 3.0 NIC specification and at the 2019 Regional Summit in Amsterdam suppliers demonstrated 200Gbps data transfers. In 2019, the Open Accelerator Module (OAM) project and the Open Domain Specific Architecture (ODSA) projects both embraced modularity as the best plan to enable the respective technologies. Several contributions have also been made that develop disaggregated server architectures, such as the FaceBook Big Sur GPU Expansion and Project Olympus FX-16 Flash Expansion.  

 

Looking ahead, we expect to embrace modularity at the device, sub-assy, platform and rack level.   The proliferation of storage device form-factors also indicates the need for better collaboration with modularity in mind. Ideal candidates for modularity include power shelves, telecom IO and inference hardware for edge appliances. Modularity can be applied at the rack with new rack standards (e.g. Open Rack release 3.0) and with advanced cooling solutions (e.g. cold plate & immersion). 

 

 

Open Firmware Solutions and Platform Security

 

The security threat landscape scales with growth of programmable devices throughout the network, and both across and within devices. Proprietary software deployed on attached devices creates opportunities for malicious threats and creates vulnerability of the supplier chain-of-custody from component manufacturing, through deployment and decommissioning.  

 

Hardware that provide root-of-trust are accompanied by common APIs.   Complete open sourced firmware stacks significantly reduce malicious opportunities and inadvertent security risks. 

 

By March of 2021, all OCP platforms will require the availability of open source repositories that enable any user to build and deploy Open System Firmware onto an OCP recognized product.  

 

 

Hardware Management 

 

Hyperscale providers have their own firmware and software implementations for remotely managing their IT equipment. They can evolve the solution as their needs change and as they find useful telemetry data to gather from the equipment. For private cloud providers (e.g. large enterprise and telecom), most are reliant on the availability of the IPMI interface or toolsets provided by the equipment OEM. The gap between hyperscale capabilities and non-hyperscale has grown disproportionately. An interoperable systems management framework available from all suppliers is needed. Fortunately, the framework definition is already in place thanks to the efforts of DMTF.org.    DMTF’s Redfish® is a standard designed to deliver simple and secure management for converged, hybrid IT and the Software Defined Data Centers. 

 

Looking ahead, OCP is positioned to prescribe that all OCP designs and specifications should comply with baseline profiles of the Redfish standard with the desired outcome of enabling interoperability and manageability across all OCP recognized products. Open source repositories for the BMC firmware as well as RMC firmware will be a future requisite.

 

 

Integrated Solutions

 

As stated above, the cloud industry has been on a transformation journey for more than a decade.   Disaggregated switchgear and server hardware has enabled the adoption of open source and whitebox hardware solutions running any software stack. This model has worked for companies that have the resources to build and support a software stack, but most companies look to their supply chain and systems integrators to resource this effort.

 

OCP Integrated Solutions (https://www.opencompute.org/solutions) enable companies to adopt proven solution stacks running on OCP Accepted™ and OCP Inspired™ hardware, thereby providing converged infrastructure solutions for the Enterprise. These managed offerings showcase a combination of software products, supporting different rack form factors. All OCP Integrated Solutions have proven support offerings, including 3rd party product support to ensure efficient, reliable and scalable workload deployments.

 

Looking ahead, OCP will be focused on enabling and promoting solutions and growing the number of solution providers, as well as sources of maintenance and support for those solutions. The re-aggregation work will also be accomplished through collaborative efforts with organizations like Open Networking Foundation (ONF), Telecom Infra Project (TIP), Open Network Users Group (ONUG), and other user-driven organizations. 

 

 

This is not a complete list: another dozen opportunities have already been identified. You can expect to see more activity and call-outs to collaborate on network optics, new approaches to system initialization, service models for cloud hardware, actions to reduce embodied carbon and more. I also encourage everyone to attend the best event for cloud hardware developers and operators, the OCP GLOBAL SUMMIT