Advantech HDD PMQ

Revision as of 03:11, 8 February 2018 by Scott68.chang (talk | contribs) (Created page with "== HDD Failure Prediction Model == 1. Get the HDD raw data 2. Get useful HDD information RTENOTITLE 3. Flow of training mathemat...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

HDD Failure Prediction Model

1. Get the HDD raw data

2. Get useful HDD information


3. Flow of training mathematics module


Using Logistic Regression Algorithm to classify the false samples in red and the true samples in blue.


4. Benchmark for PMQ accuracy 


5. Alert and Suggestion of HDD PMQ 

  1. of Event
Event Message Suggestion ( Action ) Message Condition
1 Over temperature

Lower system temperature (<40°C)

smart5 >= 10 || smart197 >= 2
2 Disk aging. Backup data to new disk. smart9 >= 26280
3 Disk read/writes error frequently.

Backup data to new disk.

smart187 >= 1
4 Power failure Check power source smart192 >= 190

Visualization and Maintenance with WISE-PaaS/RMM

If Advantech PMQ service running on edge device. There will have a  PMQ icon ( RED frame ) as below picture on WISE-PaaS/RMM device manager. To click this icon then pop up PMQ Dialog as "Predictive Goo/Failure".


Predictive Good

Predictive Failure

Install / Upgrade

Install HDD_PMQ Agent in Windows System

Execute "Agent_HDD_PMQ.exe" to install the HDD_PMQ Agent, and follow the steps to finish the installation.






Install HDD_PMQ Agent in RMM 3.3 Agent Plus version in Windows System


The RMM 3.3 Agent Plus installer provide three different type installations: Typical, Custom and Complete.


The RMM 3.3 Agent Plus include three program features:

1. MQTT Broker for internal MQTT bus.

2. RMM 3.3 Agent to communicate between WISE-PaaS/RMM Server and MQTT bus.

3. HDD_PMQ Agent for HDD smart retrieve and PMQ caculate then report to MQTT bus.



In rest to steps, the installer will trigger the MQTT Broker, RMM 3.3 Agent and HDD_PMQ Agent installer sequentially.



The detail of RMM 3.3 Agent Installation is skipped to focused on RMM 3.3 Agent Plus Installation.



Update HDD_PMQ Agent

Upload OTA upgrade package to OTA Server.

Upload PMQ.png

Upload PMQ 1.png

Select HDD_PQM Update Package to upgrade target devices.



Data Format / Event / Action

    Advantech defines 6 data categories for PMQ service. Use general JSON format suit for any PMQ solution. Customer can design his PMQ data format follow our rule in JSON. It is easy to integrate with Advantech EIS and WISE-PaaS/RMM.

Syntax for PMQ Data & Predictive Result

       type              : PMQ ( Fixed: must )
       name            : Name of this service         ( must )
      description    : Description of this service  ( must )
      version          : version of this service      ( must )

      confidence level    : confidence level of predictive algorithm ( must )

      update          : receive update cmd           ( option )
      *User-Defined : user can define its own tag : value 

    data: raw data of the PMQ service

        - *User-Defined ( in JSON Object )

   predict:    predict result

          - Failure rate: failure rate of the prediction result in ( 0 ~ 100 % )  ( must )  

            Please remapping your predict failure rate as below normalize range.

            Level: Good ( Green ): 0 ~ 54%, Warning ( Yellow ): 55 ~ 66%,  Bad ( Red ): 67 ~ 100%

   event: evnet of the PMQ Service
        -  {"n":"e1","sv":"Hard disk long-term operation in more than 40°C or vibration environment.","actionlist":"a1", "asm":"r"}

  action: action of the PMQ service
        - {"n":"a1", "sv":"Please reduce the ambient temperature to 40 °C or less or operation at stable environment.", "asm":"r"}
          "ActionLog"  "sv":"reboot + backup"     : 

    param: parameters of the PMQ service
       - {"n":"predict period", "v":60, "asm":"r", "u":"sec", "min":10, "max":86400}

Pre defined Tag:

   n: Name of resource

  bn: Base Name

   v: value

   bv: bool value

   sv: string value

   u: Unit of resource

   list:  Attribute with a value of type array

    threshold: A level, rate, or amount at which something comes into effect

    min: Minimuze value of the resource

    max: Maximum value of the resource

    msg: description of the resource

    asm: read / wirte 


Example of HDD PMQ Data Format

   "HDD_PMQ": {
             "e":[{"n":"type", "sv":"PMQ", "asm":"r"},
                  {"n":"name", "sv":"HDD_PMQ", "asm":"r"},
                  {"n":"description", "sv":"This service is HDD PMQ Service", "asm":"r"},
                  {"n":"version", "sv":"1.0.2", "asm":"r"},
                  {"n":"confidence level", "v":83.12, "asm":"r", "u":"%"},
                  {"n":"update", "sv":"", "asm":"rw"},
                  {"n":"eventNotify", "bv":true, "asm":"r"}],

             "list":[{"bn":"WDC WD3200BUCT-63TWBY0","e":[{"n":"Smart 5","v":0,"min":0, "max":20, "threshold":10, "u":"count", "msg":"Reallocated Sector Count", "asm":"r"},
                                                         {"n":"Smart 9", "v":128, "min":0, "max":35000, "threshold":26280, "u":"hr", "msg":"Power-On Hours", "asm":"r"},
                                                         {"n":"Smart 187", "v":0, "min":0, "max":5, "threshold":1, "u":"count", "msg":"Reported Uncorrectable Errors", "asm":"r"},
                                                         {"n":"Smart 192", "v":10, "min":0, "max":400, "threshold":190, "u":"number", "msg":"Power-off Retract Count", "asm":"r"},
                                                         {"n":"Smart 197", "v":0, "min":0, "max":10, "threshold":2, "u":"count", "msg":"Current Pending Sector Count", "asm":"r"},
                                                         {"n":"Smart 198", "v":2, "min":0, "max":40, "threshold":10, "u":"count", "msg":"Uncorrectable Sector Count", "asm":"r"}]},

                     {"bn":"ST3500320AS0","e":[{"n":"Smart 5","v":1,"min":0, "max":20, "threshold":10, "u":"count", "msg":"Reallocated Sector Count", "asm":"r"},
                                               {"n":"Smart 9", "v":8832, "min":0, "max":35000, "threshold":26280, "u":"hr", "msg":"Power-On Hours", "asm":"r"},
                                               {"n":"Smart 187", "v":0, "min":0, "max":5, "threshold":1, "u":"count", "msg":"Reported Uncorrectable Errors", "asm":"r"},
                                               {"n":"Smart 192", "v":100, "min":0, "max":400, "threshold":190, "u":"number", "msg":"Power-off Retract Count", "asm":"r"},
                                               {"n":"Smart 197", "v":0, "min":0, "max":10, "threshold":2, "u":"count", "msg":"Current Pending Sector Count", "asm":"r"},
                                               {"n":"Smart 198", "v":5, "min":0, "max":40, "threshold":10, "u":"count", "msg":"Uncorrectable Sector Count", "asm":"r"}]}],

                "list":[{"bn":"WDC WD3200BUCT-63TWBY0","e":[{"n":"Failure rate","v":20,"min":0, "max":100,"asm":"r"},
                                                            {"n":"hddpredict", "v":0.15, "min":0, "max":1, "threshold":0.385, "asm":"r"}]},
                        {"bn":"ST3500320AS0","e":[{"n":"Failure rate","v":40,"min":0, "max":100,"asm":"r"}, 
                                                  {"n":"hddpredict", "v":0.25, "min":0, "max":1, "threshold":0.385, "asm":"r"}]}],

                "e":[{"n":"e1","sv":"HDD back to Normal", "actionlist":"", "asm":"r"},
                     {"n":"e2","sv":"Over temperature.","actionlist":"a1", "asm":"r"},
                     {"n":"e3","sv":"Disk aging.","actionlist":"a2", "asm":"r"},
                     {"n":"e4","sv":"Disk read/writes error frequently.","actionlist":"a2", "asm":"r"},
                     {"n":"e5","sv":"Power failure.","actionlist":"a3", "asm":"r"}],

      "action": {
                 "e":[{"n":"a1", "bv":false, "msg":"Lower system temperature ( < 40 Celsius ).", "asm":"r"},
                      {"n":"a2", "bv":false, "msg":"Backup data to new disk", "asm":"r"},
                      {"n":"a3", "bv":false, "msg":"Check power sourc.", "asm":"r"},
                      {"n":"ActionLog", "sv":"", "asm":"r"}],

      "param": {
                 "e":[{"n":"report interval", "v":60, "min":10, "max":3600, "asm":"rw", "u":"sec"},
                      {"n":"enable report", "bv":true, "asm":"rw"}],



Syntax for EventNotify

PMQ severity : 4 Warning

      Severity_Emergency = 0, 
      Severity_Alert = 1, 
      Severity_Critical = 2, 
      Severity_Error = 3, 
      Severity_Warning = 4, 
      Severity_Informational = 5, 
      Severity_Debug = 6, 

subtype: predict

Example : Predict Fail event

    "susiCommData": {
        "commCmd": 2059,
        "requestID": 2001,
        "agentID": "AAAAA",
        "handlerName": "general",
        "sendTS": 1453356274,
        "eventnotify": {
            "subtype": "predict",
            "msg": "Over temperature.",
            "severity": 4,
            "handler": "HDD_PMQ",
            "extMsg": {
                "n": "WDC WD3200BUCT-63TWBY0",

Example: Predict back to Good event

    "susiCommData": {
        "commCmd": 2059,
        "requestID": 2001,
        "agentID": "AAAAA",
        "handlerName": "general",
        "sendTS": 1453356274,
        "eventnotify": {
            "subtype": "predict",
            "msg": "HDD back to Normal",
            "severity": 5,
            "handler": "HDD_PMQ",
            "extMsg": {
                "n": "WDC WD3200BUCT-63TWBY0",