Advantech HDD PMQ
Contents
HDD Failure Prediction Model
1. Get the HDD raw data
2. Get useful HDD information
3. Flow of training mathematics module
Using Logistic Regression Algorithm to classify the false samples in red and the true samples in blue.
4. Benchmark for PMQ accuracy
5. Alert and Suggestion of HDD PMQ
|
Event Message | Suggestion ( Action ) Message | Condition |
1 | Over temperature |
Lower system temperature (<40°C) |
smart5 >= 10 || smart197 >= 2 |
2 | Disk aging. | Backup data to new disk. | smart9 >= 26280 |
3 | Disk read/writes error frequently. |
Backup data to new disk. |
smart187 >= 1 |
4 | Power failure | Check power source | smart192 >= 190 |
Visualization and Maintenance with WISE-PaaS/RMM
If Advantech PMQ service running on edge device. There will have a PMQ icon ( RED frame ) as below picture on WISE-PaaS/RMM device manager. To click this icon then pop up PMQ Dialog as "Predictive Goo/Failure".
Install / Upgrade
Install HDD_PMQ Agent in Windows System
Execute "Agent_HDD_PMQ.exe" to install the HDD_PMQ Agent, and follow the steps to finish the installation.
Install HDD_PMQ Agent in RMM 3.3 Agent Plus version in Windows System
The RMM 3.3 Agent Plus installer provide three different type installations: Typical, Custom and Complete.
The RMM 3.3 Agent Plus include three program features:
1. MQTT Broker for internal MQTT bus.
2. RMM 3.3 Agent to communicate between WISE-PaaS/RMM Server and MQTT bus.
3. HDD_PMQ Agent for HDD smart retrieve and PMQ caculate then report to MQTT bus.
In rest to steps, the installer will trigger the MQTT Broker, RMM 3.3 Agent and HDD_PMQ Agent installer sequentially.
The detail of RMM 3.3 Agent Installation is skipped to focused on RMM 3.3 Agent Plus Installation.
Update HDD_PMQ Agent
Upload Agent_HDD_PMQ-V1.0.xxx...zip OTA upgrade package to OTA Server.
Select HDD_PQM Update Package to upgrade target devices.
Data Format / Event / Action
Advantech defines 6 data categories for PMQ service. Use general JSON format suit for any PMQ solution. Customer can design his PMQ data format follow our rule in JSON. It is easy to integrate with Advantech EIS and WISE-PaaS/RMM.
Syntax for PMQ Data & Predictive Result
info:
type : PMQ ( Fixed: must )
name : Name of this service ( must )
description : Description of this service ( must )
version : version of this service ( must )
confidence level : confidence level of predictive algorithm ( must )
update : receive update cmd ( option )
*User-Defined : user can define its own tag : value
data: raw data of the PMQ service
- *User-Defined ( in JSON Object )
predict: predict result
- Failure rate: failure rate of the prediction result in ( 0 ~ 100 % ) ( must )
Please remapping your predict failure rate as below normalize range.
Level: Good ( Green ): 0 ~ 54%, Warning ( Yellow ): 55 ~ 66%, Bad ( Red ): 67 ~ 100%
event: evnet of the PMQ Service
- {"n":"e1","sv":"Hard disk long-term operation in more than 40°C or vibration environment.","actionlist":"a1", "asm":"r"}
action: action of the PMQ service
- {"n":"a1", "sv":"Please reduce the ambient temperature to 40 °C or less or operation at stable environment.", "asm":"r"}
"ActionLog" "sv":"reboot + backup" :
param: parameters of the PMQ service
- {"n":"predict period", "v":60, "asm":"r", "u":"sec", "min":10, "max":86400}
Pre defined Tag:
n: Name of resource
bn: Base Name
v: value
bv: bool value
sv: string value
u: Unit of resource
list: Attribute with a value of type array
threshold: A level, rate, or amount at which something comes into effect
min: Minimuze value of the resource
max: Maximum value of the resource
msg: description of the resource
asm: read / wirte
Example of HDD PMQ Data Format
{ "HDD_PMQ": { "info":{ "e":[{"n":"type", "sv":"PMQ", "asm":"r"}, {"n":"name", "sv":"HDD_PMQ", "asm":"r"}, {"n":"description", "sv":"This service is HDD PMQ Service", "asm":"r"}, {"n":"version", "sv":"1.0.2", "asm":"r"}, {"n":"confidence level", "v":83.12, "asm":"r", "u":"%"}, {"n":"update", "sv":"", "asm":"rw"}, {"n":"eventNotify", "bv":true, "asm":"r"}], "bn":"info" }, "data":{ "list":[{"bn":"WDC WD3200BUCT-63TWBY0","e":[{"n":"Smart 5","v":0,"min":0, "max":20, "threshold":10, "u":"count", "msg":"Reallocated Sector Count", "asm":"r"}, {"n":"Smart 9", "v":128, "min":0, "max":35000, "threshold":26280, "u":"hr", "msg":"Power-On Hours", "asm":"r"}, {"n":"Smart 187", "v":0, "min":0, "max":5, "threshold":1, "u":"count", "msg":"Reported Uncorrectable Errors", "asm":"r"}, {"n":"Smart 192", "v":10, "min":0, "max":400, "threshold":190, "u":"number", "msg":"Power-off Retract Count", "asm":"r"}, {"n":"Smart 197", "v":0, "min":0, "max":10, "threshold":2, "u":"count", "msg":"Current Pending Sector Count", "asm":"r"}, {"n":"Smart 198", "v":2, "min":0, "max":40, "threshold":10, "u":"count", "msg":"Uncorrectable Sector Count", "asm":"r"}]}, {"bn":"ST3500320AS0","e":[{"n":"Smart 5","v":1,"min":0, "max":20, "threshold":10, "u":"count", "msg":"Reallocated Sector Count", "asm":"r"}, {"n":"Smart 9", "v":8832, "min":0, "max":35000, "threshold":26280, "u":"hr", "msg":"Power-On Hours", "asm":"r"}, {"n":"Smart 187", "v":0, "min":0, "max":5, "threshold":1, "u":"count", "msg":"Reported Uncorrectable Errors", "asm":"r"}, {"n":"Smart 192", "v":100, "min":0, "max":400, "threshold":190, "u":"number", "msg":"Power-off Retract Count", "asm":"r"}, {"n":"Smart 197", "v":0, "min":0, "max":10, "threshold":2, "u":"count", "msg":"Current Pending Sector Count", "asm":"r"}, {"n":"Smart 198", "v":5, "min":0, "max":40, "threshold":10, "u":"count", "msg":"Uncorrectable Sector Count", "asm":"r"}]}], "bn":"data" }, "predict":{ "list":[{"bn":"WDC WD3200BUCT-63TWBY0","e":[{"n":"Failure rate","v":20,"min":0, "max":100,"asm":"r"}, {"n":"hddpredict", "v":0.15, "min":0, "max":1, "threshold":0.385, "asm":"r"}]}, {"bn":"ST3500320AS0","e":[{"n":"Failure rate","v":40,"min":0, "max":100,"asm":"r"}, {"n":"hddpredict", "v":0.25, "min":0, "max":1, "threshold":0.385, "asm":"r"}]}], "bn":"predict" }, "event":{ "e":[{"n":"e1","sv":"HDD back to Normal", "actionlist":"", "asm":"r"}, {"n":"e2","sv":"Over temperature.","actionlist":"a1", "asm":"r"}, {"n":"e3","sv":"Disk aging.","actionlist":"a2", "asm":"r"}, {"n":"e4","sv":"Disk read/writes error frequently.","actionlist":"a2", "asm":"r"}, {"n":"e5","sv":"Power failure.","actionlist":"a3", "asm":"r"}], "bn":"event" }, "action": { "e":[{"n":"a1", "bv":false, "msg":"Lower system temperature ( < 40 Celsius ).", "asm":"r"}, {"n":"a2", "bv":false, "msg":"Backup data to new disk", "asm":"r"}, {"n":"a3", "bv":false, "msg":"Check power sourc.", "asm":"r"}, {"n":"ActionLog", "sv":"", "asm":"r"}], "bn":"action" }, "param": { "e":[{"n":"report interval", "v":60, "min":10, "max":3600, "asm":"rw", "u":"sec"}, {"n":"enable report", "bv":true, "asm":"rw"}], "bn":"param" }, "opTS":{"$date":1494554251000} }, "bn":"HDD_PMQ" }
Syntax for EventNotify
PMQ severity : 4 Warning
severity: Severity_Emergency = 0, Severity_Alert = 1, Severity_Critical = 2, Severity_Error = 3, Severity_Warning = 4, Severity_Informational = 5, Severity_Debug = 6, subtype: predict
Example : Predict Fail event
{ "susiCommData": { "commCmd": 2059, "requestID": 2001, "agentID": "AAAAA", "handlerName": "general", "sendTS": 1453356274, "eventnotify": { "subtype": "predict", "msg": "Over temperature.", "severity": 4, "handler": "HDD_PMQ", "extMsg": { "n": "WDC WD3200BUCT-63TWBY0", "eventID":"e2" } } } }
Example: Predict back to Good event
{ "susiCommData": { "commCmd": 2059, "requestID": 2001, "agentID": "AAAAA", "handlerName": "general", "sendTS": 1453356274, "eventnotify": { "subtype": "predict", "msg": "HDD back to Normal", "severity": 5, "handler": "HDD_PMQ", "extMsg": { "n": "WDC WD3200BUCT-63TWBY0", "eventID":"e1" } } } }