Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 75067 invoked from network); 12 Jan 2010 15:42:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Jan 2010 15:42:15 -0000 Received: (qmail 55596 invoked by uid 500); 12 Jan 2010 15:42:15 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 55541 invoked by uid 500); 12 Jan 2010 15:42:15 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 55529 invoked by uid 99); 12 Jan 2010 15:42:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 15:42:15 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Jan 2010 15:42:14 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 73FFA234C052 for ; Tue, 12 Jan 2010 07:41:54 -0800 (PST) Message-ID: <121746110.183241263310914473.JavaMail.jira@brutus.apache.org> Date: Tue, 12 Jan 2010 15:41:54 +0000 (UTC) From: "Ferdy (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Updated: (HBASE-2117) Simple check on the master overview page if the number of currently running regionservers is unchanged. In-Reply-To: <675025919.182991263310434465.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-2117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdy updated HBASE-2117: ------------------------- Status: Patch Available (was: Open) Set status to 'Patch available'. > Simple check on the master overview page if the number of currently running regionservers is unchanged. > ------------------------------------------------------------------------------------------------------- > > Key: HBASE-2117 > URL: https://issues.apache.org/jira/browse/HBASE-2117 > Project: Hadoop HBase > Issue Type: New Feature > Components: master, regionserver > Affects Versions: 0.20.2 > Reporter: Ferdy > Attachments: HBASE-2117.patch > > > Incidentally, it happens that some of our regionservers just stop working. The regionserver logs show some sort of termination and the affected regionserver is just removed from the master page. Besides the actual problem of the termination, what I was missing was some sort of warning (from either running client code or the master page) that some regionservers are having trouble. > It seems like the Master is ok with the fact that a regionserver suddenly decides to stop. The result is that the clients depending on the data in Hbase will be presented an incomplete data set, at least as long as the failing regions are not re-assigned yet. In order to have this monitored, I decided to create a patch that exposes an extra piece of information on the master page. An 'OK:' is presented if the current number of regionservers is unchanged since the start of the processes. An 'ERROR:' is shown whenever the current number is not the same. What the master page does is reading the 'regionservers' file once, and remember the number of slaves so that is can be used in the check. (So afterwards changes to this file are not supported). > Perhaps this is not the right way of doing things. Please let me know if there are any existing solutions for these issues. > I will attach a patch right away. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.