OCP Diagnostic framework 1.0 now available

OCP is proud to announce the initial release of the OCP Diagnostic Framework (aka OCPDiag). This software framework release establishes the foundation of the deliverables planned by the Test and Validation Initiative team with the goal of delivering a rich set of diagnostics for design validation, manufacturing and data center test and repair systems for OCP hardware. This portable framework will support collaboration with the larger OCP Community, increase adoption, and reduce the ‘go to market’ lead time for next-generation data center hardware availability.

The genesis of the initiative arose from the absence of a portable and unified test specification that would support running hardware diagnostics in heterogeneous environments, such as validation labs, manufacturing and data centers. 

The Test and Validation team is establishing the components necessary for an OCP certification program which would provide an open specification and suite of diagnostic tests to be available to adopters of OCP hardware to further increase the value proposition of open system designs. 

Overview

OCPDiag is an open source hardware diagnostics framework. The framework provides a set of core libraries with multi-language support for creating portable diagnostics that can integrate into a number of different test executive and environments.

OCPDiag is not a test executive. The framework provides test executives with a single integration point for all OCP compatible hardware diagnostics. The goal is to eliminate the wrapping effort required in porting other diagnostics to different execution environments and result schemas, and reduce the need for supporting infrastructure for large scale test execution at manufacturing.

For diagnostics that run on multi-node systems, OCPDiag offers configuration options for targeting different endpoints and backends.

OCPDiag diagnostics can execute on the device under test (DUT) or off of a central control/monitoring server. After an OCPDiag has been built and installed, a test executive or a user can initiate the test and gather its results.

The OCPDiag framework provides four primary components to speed up diagnostic development and ensure portability with other environments:

  • Parameter and Configuration Management: A flexible parameter management and configuration library which provides a mechanism of configuring tests to execute for many different use-cases and environments. This provides the ability to configure a test or diagnostic from the command line or from a configuration file using hierarchical parameters.

  • Common Result Specification:  A result specification for diagnostic output that relies on streaming JSONL to provide continuous updates for long running tests.  The resulting framework includes all the primitives necessary to return the diagnosis of hardware health for each unique component in the system, as well as the overall system itself. The multi-step result model provides elements to include all the necessary information in a structured format that supports the diagnosis of the hardware, including parametric measurements, time-series data, logging, etc. The resulting specification includes the ‘highest fidelity’ view of the information provided by the diagnostic. This is the only output integration point required for integration into various test frameworks that wish to leverage OCP diagnostics.

  • Hardware Interface: An optional hardware abstraction layer which can be used for diagnostics that need to execute in multiple operating environments (such as Linux or Windows) that can be used to abstract away details about how hardware is monitored or statistics are gathered.

  • DUT Communication Interface: An optional communication interface that can be used to abstract away differences in the security or networking environment for different diagnostic use-cases. For instance, the communication interface could be used to leverage SSH-based diagnostic communication for a manufacturing scenario, whereas the same diagnostic could leverage a secure, auditable RPC framework for data center hardware health testing based on the security and auditing requirements of the user.

In the coming weeks, we plan to release a compliance validation tool for testing new implementations of the specification, and some user-focused tools for easily reviewing the diagnostic results of a test.

The OCP diagnostic framework has been integrated into the internal manufacturing and data center test systems used at large Hyperscalers, and you can contact the Test and Validation working group if you would like to know more about this area.

In addition to proprietary Hyperscale systems, we have also provided integration with two popular open-source test execution frameworks.

  • OpenTAP is a popular open-source test engineering-focused framework used for manufacturing and hardware validation. It has a focus on operator-initiated testing and includes user-interfaces and user-friendly tools for test sequencing.

  • ConTest is a popular open source developer-focused framework used for large scale firmware, software and hardware testing, with a focus on continuous integration use cases.

You can find OCPDiag on this Github page, check out the initiative’s Wiki and join the Test and Validation Mailing list to stay informed or begin participating in the OCP Test and Validation initiative.