ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley" <jhur...@hortonworks.com>
Subject Re: Review Request 30566: HDFS, YARN, and HBase Slave Health Alert Definitions
Date Wed, 04 Feb 2015 19:27:02 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30566/#review71013
-----------------------------------------------------------



ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116515>

    RegionServer(s)



ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116516>

    Warning not needed since it has the same value as Critical.



ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116518>

    RegionServer(s)



ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116517>

    RegionServer(s)



ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116519>

    datanode_health_summary
    DataNode Health Summary



ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116520>

    There is a space in the https address



ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116521>

    DataNode(s)



ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116523>

    No need for Warning since Critical is the same value



ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116524>

    DataNode(s)



ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/alerts.json
<https://reviews.apache.org/r/30566/#comment116525>

    nodemanager_health_summary
    NodeManager Health Summary



ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py
<https://reviews.apache.org/r/30566/#comment116527>

    NodeManager



ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py
<https://reviews.apache.org/r/30566/#comment116528>

    All NodeManagers are healthy


- Jonathan Hurley


On Feb. 3, 2015, 12:58 p.m., Yurii Shylov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30566/
> -----------------------------------------------------------
> 
> (Updated Feb. 3, 2015, 12:58 p.m.)
> 
> 
> Review request for Ambari, Jonathan Robie and Srimanth Gunturi.
> 
> 
> Bugs: AMBARI-9458
>     https://issues.apache.org/jira/browse/AMBARI-9458
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> When a slave component, such as a DataNode, encounters some catastrophic problem like
a heap allocation error, and no longer can perform its work, the NameNode marks this DataNode
as being unhealthy.
> 
> The current alert definitions only check for the DataNode process being alive, which
is still technically is. We need to add new alert definitions for:
> 
> - HDFS/DataNode (runs on NameNode, query is to NameNode JMX)
> - YARN/NodeManager (runs on ResourceManager, query is to ResourceManager JMX)
> - HBase/RegionServer (runs on HBase Master, queries HBase Master JMX)
> 
> Which will check for slaves that are in some sort of bad state. Depending on the JMX
structures that need to be queried, these can either be METRIC or SCRIPT style alert definitions.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/alerts.json fa911e1

>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json b8a20ac

>   ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/alerts.json dc4fafd

>   ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py
PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30566/diff/
> 
> 
> Testing
> -------
> 
> In progress
> 
> 
> Thanks,
> 
> Yurii Shylov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message