ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ganesh Viswanathan <gan...@gmail.com>
Subject Ambari Metrics Collector Process alert - CRITICAL threshold rule
Date Fri, 28 Oct 2016 17:37:00 GMT
Hello,

The Ambari "Metrics Collector Process" Alert has a different defintion for
CRITICAL threshold vs. OK and WARNING thresholds. What is the reason for
this?

In my tests, CRITICAL seems like a "point-in-time" alert and the value of
that field is not being used. When the metrics collector process is killed
or restarts, the alert fires in 1min or less even when I set the threshold
value to 600s. This means the alert description of "*This alert is
triggered if the Metrics Collector cannot be confirmed to be up and
listening on the configured port for number of seconds equal to threshold."*
NOT VALID for CRITICAL threshold. Is that true and what is the reason for
this discrepancy? Has anyone else gotten false pages because of this and
what is the fix?

"ok": {
"text": "TCP OK - {0:.3f}s response on port {1}"
},
"warning": {
"text": "TCP OK - {0:.3f}s response on port {1}",
"value": 1.5
},
"critical": {
"text": "Connection failed: {0} to {1}:{2}",
"value": 5.0
}

Ref:
https://github.com/apache/ambari/blob/2ad42074f1633c5c6f56cf979bdaa49440457566/ambari-server/src/main/resources/common-services/AMBARI_METRICS/0.1.0/alerts.json#L102

Thanks,
Ganesh

Mime
View raw message