As many of you learned about the OCP Device Manager functionality during OCP TECH WEEK, the following will provide a summary of the technical aspects of Device Management, from a high level perspective. As we know, Device Manager will collect device data and notifications from each device, and make the data available on a predetermined output bus for consumers. The consumers of this data could then use it for various purposes, such as:
- Inventory management
- Status monitoring
- Device updates
The diagram below illustrates the overall architecture and interaction between various components at a very high level.
① Notification from Management Controller to add the IP address of the device to be monitored to the device list (Device Manager can also use Ref ⑤ to refresh all its data.) Device Manager can also subscribe to events from the device by sending a HTTP POST to the URL of the Resource Collection for "Subscriptions" in the Event Service. At this point, Device Manager can query status parameters with HTTP GET method to PSME (Pooled System Management Engine) periodically. The default polling interval is set to five seconds.
② : Device Manager is an alert Receiver. When the Redfish service interface is responding to the Device Manager, it uses RESTful APIs. If PSME detects a change in one of the monitored hardware states, it will send an alert for the subscribed event to the Device Manager. A particular issue is reported only once until the state of the hardware changes.
③ : Device Manager retrieves data using HTTP GET method from devices periodically.
④ : Switch responds using HTTP RESPONSE to Device Manager.
⑤ : The Management Controller will provide the active device list to Device Manager which needs to be monitored. When the device becomes available, the Device Manager uses this IP address to connect PSME (or OpenBMC) using HTTP protocol.
⑥ : Device Manager publishes the device alarm/status into Output Bus.
⑦ : If exists, BMC can also be used to retrieve device status and manage the target device. However, BMC may not provide the full functionality offered by PSME, including periodic data collection or alarms.
⑧ : PSME inspects hardware status at regular intervals to detect hardware failure/status through Network OS’s SW stack.
The OCP Foundation and its Community are excited to have this software contribution become a part of OCP software family. This contribution is expected to be a new workstream under the Hardware Management project and the source code will be contributed to the OCP Github. Please stay tuned for more information.