ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yurii Shylov" <yurii.shy...@gmail.com>
Subject Review Request 30566: HDFS, YARN, and HBase Slave Health Alert Definitions
Date Tue, 03 Feb 2015 17:58:03 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30566/
-----------------------------------------------------------

Review request for Ambari, Jonathan Robie and Srimanth Gunturi.


Bugs: AMBARI-9458
    https://issues.apache.org/jira/browse/AMBARI-9458


Repository: ambari


Description
-------

When a slave component, such as a DataNode, encounters some catastrophic problem like a heap
allocation error, and no longer can perform its work, the NameNode marks this DataNode as
being unhealthy.

The current alert definitions only check for the DataNode process being alive, which is still
technically is. We need to add new alert definitions for:

- HDFS/DataNode (runs on NameNode, query is to NameNode JMX)
- YARN/NodeManager (runs on ResourceManager, query is to ResourceManager JMX)
- HBase/RegionServer (runs on HBase Master, queries HBase Master JMX)

Which will check for slaves that are in some sort of bad state. Depending on the JMX structures
that need to be queried, these can either be METRIC or SCRIPT style alert definitions.


Diffs
-----

  ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/alerts.json fa911e1 
  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json b8a20ac 
  ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/alerts.json dc4fafd 
  ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py
PRE-CREATION 

Diff: https://reviews.apache.org/r/30566/diff/


Testing
-------

In progress


Thanks,

Yurii Shylov


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message