Hardware Management/FMFM: Difference between revisions

From OpenCompute
Jump to navigation Jump to search
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Welcome to the OCP Fleetscale Memory Fault Management (FMFM) WIKI==
==Welcome to the OCP Fleetscale Memory Fault Management (FMFM) WIKI==
Fleetscale Memory Fault Management is a Worksteam within the Hardware Management Project.
Fleetscale Memory Fault Management is a Worksteam within the Hardware Management Project.


[https://www.opencompute.org/wiki/Hardware_Management Hardware Management Project]
[https://www.opencompute.org/wiki/Hardware_Management Hardware Management Project]


==Leadership==


===Leadership===
* [mailto:shen.zhou@intel.com Shen Zhou]
* [mailto:shen.zhou@intel.com Shen Zhou]
* [mailto:acwalton@google.com Drew Walton]
* [mailto:acwalton@google.com Drew Walton]
* [mailto:yogesh.varmau@intel.com Yogesh Varma]
* [mailto:yogesh.varmau@intel.com Yogesh Varma]


===Scope===
==Scope==
 
The FMFM is a workstream about standardization of Fleetscale Memory Fault Management
The FMFM is a workstream about standardization of Fleetscale Memory Fault Management
*Proposed topics:
*Proposed topics:
<ol>
<ol>
Line 32: Line 35:
==Get Involved==
==Get Involved==


===Subproject Meets Biweekly on Tuesday from 7:10-8 am PST ===


* [https://ocp-all.groups.io/g/OCP-HWMgt-FMFM/calendar Link to the FMFM Calendar]
* [https://global.gotomeeting.com/join/454746381 Link to the Meeting]
* You can also dial in using your phone : United States: +1 (646) 749-3112 Access Code: 454-746-381


===Mailing List ===


===Subproject Meets Biweekly on Tuesday from 7:10-8 am PST ===
: - [https://ocp-all.groups.io/g/OCP-HWMgt-FMFM/calendar Link to the FMFM Calendar]
: - [https://global.gotomeeting.com/join/454746381 Link to the Meeting]
:  - You can also dial in using your phone : United States: +1 (646) 749-3112 Access Code: 454-746-381
===Mailing List ===
Participate in the discussion:
Participate in the discussion:
: - FMFM on OCP Groups.io: [https://ocp-all.groups.io/g/OCP-HWMgt-FMFM FMFM Group Link]
: - [mailto:OCP-HWMgt-FMFM+subscribe@OCP-All.groups.io Subscribe to mailing list]
: - [mailto:OCP-HWMgt-FMFM@OCP-All.groups.io Post to mailing list]


* FMFM on OCP Groups.io: [https://ocp-all.groups.io/g/OCP-HWMgt-FMFM FMFM Group Link]
* [mailto:OCP-HWMgt-FMFM+subscribe@OCP-All.groups.io Subscribe to mailing list]
* [mailto:OCP-HWMgt-FMFM@OCP-All.groups.io Post to mailing list]


=== Review and provide Feedback ===
===Documents===


* [https://docs.google.com/document/d/1xmDmlXKMluo4WzuhGYWwChA8rdvjli96Ev2HO7tABM8/edit#heading=h.s2v3a52onncc Link to Fleetscale Memory Fault Management (FMFM) Workstream Proposal]
* [https://docs.google.com/document/d/1LtEmytHLozJ8sBAdshO5KYtQgE-PwB6os4NLVrmXETM/edit?usp=sharing Link to Fleetscale Memory Fault Management (FMFM) Framework Requirements]


===Documents===
===Past Presentation Recordings===
[https://docs.google.com/document/d/1xmDmlXKMluo4WzuhGYWwChA8rdvjli96Ev2HO7tABM8/edit#heading=h.s2v3a52onncc Link to Fleetscale Memory Fault Management (FMFM) Workstream Proposal on Google Drive]
* [https://www.youtube.com/watch?v=ZeqgPE9IC_o&list=PLAG-eekRQBShiPHyTkmsO_VbkHtmsmnba&index=3 Link to FMFM Talk at 2023 OCP Global Summit]
 
[https://docs.google.com/document/d/1LtEmytHLozJ8sBAdshO5KYtQgE-PwB6os4NLVrmXETM/edit?usp=sharing Link to Fleetscale Memory Fault Management (FMFM) Framework Requirements Google Drive]


===Fleetscale Memory Fault Management Events===
===FMFM Weekly Call Recordings===


:- [ TBD Jan 30, 2024]
* [https://www.youtube.com/watch?v=otOS3iJZBQg May 07, 2024]
:- [https://opencompute-org.zoom.us/rec/share/PPU3bkjPRlqsZe8T7K07sqna5R14xSgxrRzn7PrXEUKbOCRH1B6f-12onVe6Gd9p.dZ4x2-95LtD2kOb5?pwd=vytXCp2vY80mI4TZC6bUSBbnsTPFeGuk Jan 16, 2024]
* [https://www.youtube.com/watch?v=SQw1UNmVu-A Apr 23, 2024]
:- [https://opencompute-org.zoom.us/rec/share/7NmO1ubfR58v8Ys8jAaf3RsgoDtdN1BQwCwyCnZ1I4xFh10xbkGBdlxuDglucHHH.WFw1eRxQ_0kK-rpp?pwd=ZlFymeW8GOkVO_0xo7zmcJ58HtGQY__I Jan 2, 2024]
* [https://www.youtube.com/watch?v=pSO5I1uWC9A Apr 09, 2024]
:- [https://opencompute-org.zoom.us/rec/share/GApcoWf05x5U4On1itrQA8jQ5hsv6zeh6xdqp-i-anh0xinll3j2RkscBwf_sP8V.HiLsevH6NH3ZK_a8?pwd=I9p-4Vs5XX0H_h7R-RRKVjch6S6HSM07 Dec 5, 2023]
* [https://www.youtube.com/watch?v=D71VSdtm7Sg Mar 26, 2024]
:- [https://opencompute-org.zoom.us/rec/share/oKnLt_CgYo48yGfBwjQAUAuag4rK4muB7Lh6PGFGay6oF9g7eviGPiFMpWASMNwn.gUbO3b8ojdSdoCsQ?pwd=RRJvLmNg9oA3P9A06fTN-7_iGilzozO7 Nov 21, 2023]
* [https://www.youtube.com/watch?v=_HC81xZJpQ0 Mar 12, 2024]
:- [https://opencompute-org.zoom.us/rec/share/5_qSlej9J4sY5wCX_teCBO7UpQwvdsUXDL5Rzx09u-qwbtIn51XTMw0jwoY-UNGj.Uv6H-KsLBdyH1fHG?pwd=GACQB_DL1Diq7BenwRg48u1KaVbHAge1 Nov 7, 2023]
* [https://www.youtube.com/watch?v=vziNL9RSrOQ Feb 27, 2024]
:- [https://opencompute-org.zoom.us/rec/share/ZlSTJbvNoxT7ndWzYfmqWF4xrKTXaZn0gy0L5ro07CdmrpdV1iDLslFgrhxkaCql.vqZYsBF01IXGlMOZ?pwd=fkNIkT5-1NQH7DDOnnqZ8Y10aHfejAI6 Oct 24, 2023]
* [https://www.youtube.com/watch?v=tr8AXCgMF-8 Feb 13, 2024]
:- [https://opencompute-org.zoom.us/rec/share/__bvxQL0qigsxksSANTHv3iUmulv8885k1pLU80UVEHcwg_efBuQRrraCIKOWlyw.WB5JQ9WBt53Gp5fQ?pwd=MlpIrwUtH3sthl7Epaz4fxw9Nj9U0IKn Oct 10, 2023]
* [https://www.youtube.com/watch?v=IzsxR2O3ioU Jan 30, 2024]
:- [https://opencompute-org.zoom.us/rec/share/5p9Vu5Q_T98Pz5G6q_0TaRkxNbgU4LmlfatJtkN2Vr5Ko98akpaP7BEbqau7Tj-i.HNGH4M5XF50hcY-W?pwd=sX-B1zHuOIP0LTty2jYcJLxMobzwblx- Sep 26, 2023]
* [https://www.youtube.com/watch?v=sk3rIiS_o5U Jan 16, 2024]
:- [https://opencompute-org.zoom.us/rec/share/c2L7pv9YOi_HZixQL52UIRfFwPxv0i-9-7ApXVcwVoWb0E0T7WIitwYGPK5AEAnR.eIDYGJ7iflwhgE_d?pwd=NA3mPG9p-StXSPxRKqidyTMAolvU-HuW Sep 12, 2023]
* [https://www.youtube.com/watch?v=GNeNMnr87Qg Jan 2, 2024]
:- [https://opencompute-org.zoom.us/rec/share/UUPwWe15sj0IZ1AKz_S84msLpFzV7n2-q6-QwcM03TKdLIhlcSS80DdyuL5ACoPq.RH_DRaCyupWhuwf-?pwd=G7BGOudlMU9ivmPoQqWr0deRncxmpLmk Aug 29, 2023]
* [https://www.youtube.com/watch?v=vm4c8FhY6N8 Dec 5, 2023]
* [https://www.youtube.com/watch?v=Z1Pwi5TpwRA Nov 21, 2023]
* [https://www.youtube.com/watch?v=hr1pq6WfrcA Nov 7, 2023]
* [https://www.youtube.com/watch?v=BV_WZxRmGUw Oct 24, 2023]
* [https://www.youtube.com/watch?v=XtLSclRThIM Oct 10, 2023]
* [https://www.youtube.com/watch?v=5aQh4Ke__b8 Sep 26, 2023]
* [https://www.youtube.com/watch?v=1K0MtdNSd_Q Sep 12, 2023]
* [https://www.youtube.com/watch?v=-DfMNSUB6e4 Aug 29, 2023]

Latest revision as of 19:33, 7 May 2024

Welcome to the OCP Fleetscale Memory Fault Management (FMFM) WIKI[edit]

Fleetscale Memory Fault Management is a Worksteam within the Hardware Management Project.

Hardware Management Project

Leadership[edit]

Scope[edit]

The FMFM is a workstream about standardization of Fleetscale Memory Fault Management

  • Proposed topics:
  1. Standardize vendor agnostic architecture for memory error handling
    1. Modularization of inputs from different hardware vendors
    2. APIs and connections between different modules from different vendors.
    3. Define the output of each module (failure cause, health information, RAS actions, etc.)
  2. Standardize memory error telemetry
    1. Format content for better fleet scale RAS management
    2. Troubleshooting, FRU replacement policies, etc.
  3. Coordinate with the broader OCP group to make sure there is a path for this general architecture

Get Involved[edit]

Subproject Meets Biweekly on Tuesday from 7:10-8 am PST[edit]

Mailing List[edit]

Participate in the discussion:

Documents[edit]

Past Presentation Recordings[edit]

FMFM Weekly Call Recordings[edit]