Return-Path: X-Original-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ECD7A9894 for ; Sun, 18 Mar 2012 02:53:08 +0000 (UTC) Received: (qmail 28869 invoked by uid 500); 18 Mar 2012 02:53:07 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 28740 invoked by uid 500); 18 Mar 2012 02:53:07 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 28727 invoked by uid 99); 18 Mar 2012 02:53:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Mar 2012 02:53:07 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Mar 2012 02:53:00 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 9DE70251F1 for ; Sun, 18 Mar 2012 02:52:39 +0000 (UTC) Date: Sun, 18 Mar 2012 02:52:39 +0000 (UTC) From: "Aaron T. Myers (Commented) (JIRA)" To: common-issues@hadoop.apache.org Message-ID: <16011790.28801.1332039159659.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HADOOP-7788) HA: Simple HealthMonitor class to watch an HAService MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232132#comment-13232132 ] Aaron T. Myers commented on HADOOP-7788: ---------------------------------------- Largely looks good, Todd. A few small comments: # I don't understand why it's necessary/desirable to set "{{shouldRun = false}}" in the case of an InterruptedException in {{run()}}. When would this method be interrupted except in the case of a call to {{shutdown()}}, which itself sets {{shouldRun}} to false? # "{{LOG.warn("Transport-level exception trying to monitor health of " + addrToMonitor + ": " + t.getLocalizedMessage());}}" - let's log the full exception stack trace here. # Seems odd to me that the main method of this class creates a HealthMonitor and calls {{run()}}, instead of calling {{start()}}/{{join()}}. +1 once these are addressed. > HA: Simple HealthMonitor class to watch an HAService > ---------------------------------------------------- > > Key: HADOOP-7788 > URL: https://issues.apache.org/jira/browse/HADOOP-7788 > Project: Hadoop Common > Issue Type: New Feature > Components: ha > Affects Versions: 0.24.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hadoop-7788.txt, hdfs-2524.txt > > > This is a utility class which will be part of the FailoverController. The class starts a daemon thread which periodically monitors an HAService, calling its monitorHealth function. It then generates callbacks into another class when the health status changes (eg the RPC fails or the service returns a HealthCheckFailedException) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira