Return-Path: X-Original-To: apmail-ambari-dev-archive@www.apache.org Delivered-To: apmail-ambari-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B761217588 for ; Sat, 7 Feb 2015 02:01:36 +0000 (UTC) Received: (qmail 75606 invoked by uid 500); 7 Feb 2015 02:01:36 -0000 Delivered-To: apmail-ambari-dev-archive@ambari.apache.org Received: (qmail 75576 invoked by uid 500); 7 Feb 2015 02:01:36 -0000 Mailing-List: contact dev-help@ambari.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ambari.apache.org Delivered-To: mailing list dev@ambari.apache.org Received: (qmail 75564 invoked by uid 99); 7 Feb 2015 02:01:36 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Feb 2015 02:01:36 +0000 Date: Sat, 7 Feb 2015 02:01:36 +0000 (UTC) From: "Hudson (JIRA)" To: dev@ambari.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AMBARI-9458) HDFS, YARN, and HBase Slave Health Alert Definitions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AMBARI-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310439#comment-14310439 ] Hudson commented on AMBARI-9458: -------------------------------- FAILURE: Integrated in Ambari-trunk-Commit #1712 (See [https://builds.apache.org/job/Ambari-trunk-Commit/1712/]) AMBARI-9458 - HDFS, YARN, and HBase Slave Health Alert Definitions (Yurii Shylov via jonathanhurley) (jhurley: http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=405b3762c5bde6a929f7b22732fa39b42bd24291) * ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py * ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json * ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/alerts.json * ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/alerts.json > HDFS, YARN, and HBase Slave Health Alert Definitions > ---------------------------------------------------- > > Key: AMBARI-9458 > URL: https://issues.apache.org/jira/browse/AMBARI-9458 > Project: Ambari > Issue Type: Task > Components: ambari-server > Affects Versions: 2.0.0 > Reporter: Yurii Shylov > Assignee: Yurii Shylov > Fix For: 2.0.0 > > Attachments: AMBARI-9458.patch > > > When a slave component, such as a DataNode, encounters some catastrophic problem like a heap allocation error, and no longer can perform its work, the NameNode marks this DataNode as being unhealthy. > The current alert definitions only check for the DataNode process being alive, which is still technically is. We need to add new alert definitions for: > - HDFS/DataNode (runs on NameNode, query is to NameNode JMX) > - YARN/NodeManager (runs on ResourceManager, query is to ResourceManager JMX) > - HBase/RegionServer (runs on HBase Master, queries HBase Master JMX) > Which will check for slaves that are in some sort of bad state. Depending on the JMX structures that need to be queried, these can either be METRIC or SCRIPT style alert definitions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)