Hardware Management/Hardware Fault Management
Jump to navigation
Jump to search
Welcome
- Welcome to the OCP Hardware Fault Management Sub-project. This is a sub-project of the OCP Hardware Management Project
- This Project is open to the public and we welcome all those who would like to be involved.
- Disclaimer: Please do not submit any confidential information to the Project Community. All presentation materials, proposals, meeting minutes and/or supporting documents are published by OCP and are open to the public in accordance to OCP's Bylaws and IP Policy. This can be found on the OCP OCP Policies page. If you have any questions please contact OCP.
Project Overview
The Hardware Fault Management subproject’s goal is to address the Large scale data center pain-points in managing hardware faults effectively.
Objective of the subproject: to define hardware fault management solution such that it:
- Minimizes performance degradation
- Reduces No Trouble Found (NTF) in RMA
- Scales the solution across heterogeneous fleet
Activities of the sub-project include:
- Develop running list of pain-points related to HW fault handling in the fleet
- Develop hardware fault management solution best practices to address the above pain-points
- Hardware Error reporting format Standardization
- Define hardware error classification
- OCP Fault Management Infrastructure Proposal
- Standardizing system behavior under hardware failures (future topic)
- Provide reference and guidance on system hardware failure management (future topic)
Sub-Project Leadership
- Yogesh Varma (Intel)
- Rama Bhimanadhuni (Microsoft)
Get Involved
Subproject Meets every Friday from 2-3pm PT
Please join HWFM meeting from your computer, tablet or smartphone. https://opencompute-org.zoom.us/j/88981755966?pwd=UlN1ZzU3NzV5MU00UjE5Vi8xZUF6Zz09
Mailing List
Participate in the discussion, mailing list: OCP-HWFaultMgt@OCP-All.groups.io Mailing List Info
Review and provide Feedback
- Provide feedback and comments to working documents - Link to Work-in-progress Documents Folder
Presentations & Other Documents
OCP Presentation Template - please contact Michael Schill for a copy
- OCP2023 Global Summit: HWFM Infrastructure Framework Requirements Presentation
- OCP2023 Global Summit: OCP HW Fault Management Sub-project Update
- OCP2022 Global Summit: OCP HW Fault Management Sub-project Talk
- OCP2021 Global Summit: OCP HW Fault Management Sub-project Status Update
- OCP2020 Tech Week Lighting Talk: (FF 2 Hr) OCP HW Fault Management Sub-project Update - HWFM, FMFM, RAS API
- OCP2020 Global Virtual Summit presentation: Overview of HW Fault Management Sub-project
- OCP2019 Regional Summit: Platform Hardware Error Handling Requirements for Large Scale Data Center
- OCP2019 Global Summit: Minimizing the performance penalty associated with SMM based hardware error handling solution
- DMTF/Redfish Hardware Error Reporting Standardization Proposal : Hardware Error Reporting Standardization for cloud-scale and Edge Infrastructure presented at OCP2021
Past Meeting Recordings
- Nov 15, 2024
- Nov 08, 2024
- Oct 28, 2024
- Oct 25, 2024
- Oct 04, 2024
- Sep 27, 2024
- Sep 20, 2024
- Aug 30, 2024
- Aug 23, 2024
- Aug 16, 2024
- Aug 09, 2024
- Jul 26, 2024
- Jul 19, 2024
- Jul 12, 2024
- Jun 28, 2024
- May 31, 2024
- May 17, 2024
- May 03, 2024
- Apr 26, 2024
- Apr 19, 2024
- Apr 12, 2024
- Apr 05, 2024
- Mar 22, 2024
- Mar 15, 2024
- Mar 08, 2024
- Mar 01, 2024
- Feb 23, 2024
- Feb 16, 2024
- Feb 9, 2024
- Feb 2, 2024
- Jan 26, 2024
- Jan 19, 2024
- Jan 12, 2024
- Jan 5, 2024
- Dec 15, 2023
- Dec 8, 2023
- Dec 1, 2023
- Nov 17, 2023
- Nov 10, 2023
- Nov 3, 2023
- Oct 27, 2023
- Oct 6, 2023
- Sep 29, 2023
- Sep 22, 2023
- Sep 15, 2023
- Sep 8, 2023
- Aug 25, 2023
- August 18, 2023
- August 11, 2023
- August 4, 2023
- July 28, 2023
- July 14, 2023
- July 7, 2023
- June 30, 2023
- May 19, 2023
- May 12, 2023
- April 14, 2023
- March 31, 2023
- March 24, 2023
- March 3, 2023
- February 24, 2023
- February 10, 2023
- February 3, 2023
- Jan 27, 2023
- Jan 20, 2023
- Jan 13, 2023
- January 6, 2023 - No Recording
- December 23, 2022 - No Recording
- December 16th, 2022
- December 9th, 2022
- December 2nd, 2022
- November 18th, 2022
- November 4th, 2022
- October 14th, 2022
- October 7th, 2022
- September 30th, 2022
- September 23rd, 2022
- August 26th, 2022
- August 19th, 2022
- August 12th, 2022
- August 5th, 2022
- July 29th, 2022
- July 15th, 2022
- June 24th, 2022
- June 10th, 2022
- May 6th, 2022
- April 22nd, 2022 - No Recording
- April 8th, 2022
- February 18th, 2022
- February 11th, 2022
- February 4th, 2022
- January 28th, 2022
- January 21st, 2022
- October 8th, 2021
- September 17th, 2021
- August 20th, 2021
- August 13th, 2021
- July 30th, 2021
- June 25th, 2021
- April 30th, 2021
- March 26th, 2021
- February 26th, 2021
- January 29th, 2021
- September 25th, 2020
- August 28th, 2020
- July 31st, 2020
- June 26th, 2020
Calendar
- Participate in HW Fault Management sub-project meetings - OCP Hardware Management Calendar