Hardware Management/FMFM: Difference between revisions

From OpenCompute
Jump to navigation Jump to search
Line 55: Line 55:


===Fleetscale Memory Fault Management Events===
===Fleetscale Memory Fault Management Events===
* Coming Soon
 
:- [https://opencompute-org.zoom.us/rec/share/PPU3bkjPRlqsZe8T7K07sqna5R14xSgxrRzn7PrXEUKbOCRH1B6f-12onVe6Gd9p.dZ4x2-95LtD2kOb5?pwd=vytXCp2vY80mI4TZC6bUSBbnsTPFeGuk Jan 16, 2024]
:- [https://opencompute-org.zoom.us/rec/share/7NmO1ubfR58v8Ys8jAaf3RsgoDtdN1BQwCwyCnZ1I4xFh10xbkGBdlxuDglucHHH.WFw1eRxQ_0kK-rpp?pwd=ZlFymeW8GOkVO_0xo7zmcJ58HtGQY__I Jan 2, 2024]
:- [https://opencompute-org.zoom.us/rec/share/GApcoWf05x5U4On1itrQA8jQ5hsv6zeh6xdqp-i-anh0xinll3j2RkscBwf_sP8V.HiLsevH6NH3ZK_a8?pwd=I9p-4Vs5XX0H_h7R-RRKVjch6S6HSM07 Dec 5, 2023]
:- [https://opencompute-org.zoom.us/rec/share/oKnLt_CgYo48yGfBwjQAUAuag4rK4muB7Lh6PGFGay6oF9g7eviGPiFMpWASMNwn.gUbO3b8ojdSdoCsQ?pwd=RRJvLmNg9oA3P9A06fTN-7_iGilzozO7 Nov 21, 2023]
:- [https://opencompute-org.zoom.us/rec/share/5_qSlej9J4sY5wCX_teCBO7UpQwvdsUXDL5Rzx09u-qwbtIn51XTMw0jwoY-UNGj.Uv6H-KsLBdyH1fHG?pwd=GACQB_DL1Diq7BenwRg48u1KaVbHAge1 Nov 7, 2023]
:- [https://opencompute-org.zoom.us/rec/share/ZlSTJbvNoxT7ndWzYfmqWF4xrKTXaZn0gy0L5ro07CdmrpdV1iDLslFgrhxkaCql.vqZYsBF01IXGlMOZ?pwd=fkNIkT5-1NQH7DDOnnqZ8Y10aHfejAI6 Oct 24, 2023]
:- [https://opencompute-org.zoom.us/rec/share/__bvxQL0qigsxksSANTHv3iUmulv8885k1pLU80UVEHcwg_efBuQRrraCIKOWlyw.WB5JQ9WBt53Gp5fQ?pwd=MlpIrwUtH3sthl7Epaz4fxw9Nj9U0IKn Oct 10, 2023]
:- [https://opencompute-org.zoom.us/rec/share/5p9Vu5Q_T98Pz5G6q_0TaRkxNbgU4LmlfatJtkN2Vr5Ko98akpaP7BEbqau7Tj-i.HNGH4M5XF50hcY-W?pwd=sX-B1zHuOIP0LTty2jYcJLxMobzwblx- Sep 26, 2023]
:- [https://opencompute-org.zoom.us/rec/share/c2L7pv9YOi_HZixQL52UIRfFwPxv0i-9-7ApXVcwVoWb0E0T7WIitwYGPK5AEAnR.eIDYGJ7iflwhgE_d?pwd=NA3mPG9p-StXSPxRKqidyTMAolvU-HuW Sep 12, 2023]
:- [https://opencompute-org.zoom.us/rec/share/UUPwWe15sj0IZ1AKz_S84msLpFzV7n2-q6-QwcM03TKdLIhlcSS80DdyuL5ACoPq.RH_DRaCyupWhuwf-?pwd=G7BGOudlMU9ivmPoQqWr0deRncxmpLmk Aug 29, 2023]

Revision as of 20:48, 23 January 2024

Welcome to the OCP Fleetscale Memory Fault Management (FMFM) WIKI

Fleetscale Memory Fault Management is a Worksteam within the Hardware Management Project.

Hardware Management Project


Leadership

Scope

The FMFM is a workstream about standardization of Fleetscale Memory Fault Management

  • Proposed topics:
  1. Standardize vendor agnostic architecture for memory error handling
    1. Modularization of inputs from different hardware vendors
    2. APIs and connections between different modules from different vendors.
    3. Define the output of each module (failure cause, health information, RAS actions, etc.)
  2. Standardize memory error telemetry
    1. Format content for better fleet scale RAS management
    2. Troubleshooting, FRU replacement policies, etc.
  3. Coordinate with the broader OCP group to make sure there is a path for this general architecture

Get Involved

Subproject Meets Biweekly on Tuesday from 7-9 am PST

- Link to the FMFM Calendar
- Link to the Meeting
- You can also dial in using your phone : United States: +1 (646) 749-3112 Access Code: 454-746-381


Mailing List

Participate in the discussion:

- FMFM on OCP Groups.io: FMFM Group Link
- Subscribe to mailing list
- Post to mailing list


Review and provide Feedback

Documents

Link to Fleetscale Memory Fault Management (FMFM) Workstream Proposal on Google Drive

Fleetscale Memory Fault Management Events

- Jan 16, 2024
- Jan 2, 2024
- Dec 5, 2023
- Nov 21, 2023
- Nov 7, 2023
- Oct 24, 2023
- Oct 10, 2023
- Sep 26, 2023
- Sep 12, 2023
- Aug 29, 2023