Hardware Management/FMFM: Difference between revisions

From OpenCompute
Jump to navigation Jump to search
No edit summary
Line 54: Line 54:
* [https://docs.google.com/document/d/1LtEmytHLozJ8sBAdshO5KYtQgE-PwB6os4NLVrmXETM/edit?usp=sharing Link to Fleetscale Memory Fault Management (FMFM) Framework Requirements Google Drive]
* [https://docs.google.com/document/d/1LtEmytHLozJ8sBAdshO5KYtQgE-PwB6os4NLVrmXETM/edit?usp=sharing Link to Fleetscale Memory Fault Management (FMFM) Framework Requirements Google Drive]


===Fleetscale Memory Fault Management Events===
===FMFM Weekly Call Recordings===


* [https://www.youtube.com/watch?v=IzsxR2O3ioU Jan 30, 2024]
* [https://www.youtube.com/watch?v=IzsxR2O3ioU Jan 30, 2024]

Revision as of 15:20, 13 February 2024

Welcome to the OCP Fleetscale Memory Fault Management (FMFM) WIKI

Fleetscale Memory Fault Management is a Worksteam within the Hardware Management Project.

Hardware Management Project

Leadership

Scope

The FMFM is a workstream about standardization of Fleetscale Memory Fault Management

  • Proposed topics:
  1. Standardize vendor agnostic architecture for memory error handling
    1. Modularization of inputs from different hardware vendors
    2. APIs and connections between different modules from different vendors.
    3. Define the output of each module (failure cause, health information, RAS actions, etc.)
  2. Standardize memory error telemetry
    1. Format content for better fleet scale RAS management
    2. Troubleshooting, FRU replacement policies, etc.
  3. Coordinate with the broader OCP group to make sure there is a path for this general architecture

Get Involved

Subproject Meets Biweekly on Tuesday from 7:10-8 am PST

Mailing List

Participate in the discussion:

Documents

FMFM Weekly Call Recordings