Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 86854 invoked from network); 9 Oct 2007 20:17:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Oct 2007 20:17:12 -0000 Received: (qmail 24861 invoked by uid 500); 9 Oct 2007 20:17:00 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 24495 invoked by uid 500); 9 Oct 2007 20:16:59 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 24486 invoked by uid 99); 9 Oct 2007 20:16:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Oct 2007 13:16:59 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Oct 2007 20:17:10 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 9D23F7141FE for ; Tue, 9 Oct 2007 13:16:50 -0700 (PDT) Message-ID: <17794083.1191961010607.JavaMail.jira@brutus> Date: Tue, 9 Oct 2007 13:16:50 -0700 (PDT) From: "stack (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-2017) [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266 In-Reply-To: <25718086.1191959571066.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2017: -------------------------- Attachment: trsa.patch A patch w/ more logging and thread dumping to better help what is going on, and a mechanism that notices moved regions sooner. {code} HADOOP-2017 TestRegionServerAbort failure in patch build #903 and nightly #266 Notice moved META regions sooner. Also added more logging and thread dumping once a minute when test starts to take too long so can see where we are hung (if we are hung). M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestHStoreFile.java Inherit from HBaseTestCase. M src/contrib/hbase/src/test/org/apache/hadoop/hbase/HBaseClusterTestCase.java (threadDumpingJoin): Added. M src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestRegionServerAbort.java Run verification in its own thread so can concurrently thread dump if test is going on too long. M src/contrib/hbase/src/test/org/apache/hadoop/hbase/DFSAbort.java Moved join up into parent class. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/Chore.java Remove unused import. M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java (MetaRegion.toString): Added. Added logging around assignment checking and log split. (MetaRegion.compareTo): Add consideration of server address. (numberOfMetaRegions, metaRegionsToScan, onlineMetaRegions): Put declaration and assignment together and made final. (scanOneMetaRegion): If the region is no longer in onlineMetaRegions, give up trying to scan. (unassignRootRegion): Added (Not yet finished). {code} > [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266 > -------------------------------------------------------------------------- > > Key: HADOOP-2017 > URL: https://issues.apache.org/jira/browse/HADOOP-2017 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: stack > Priority: Minor > Fix For: 0.15.0 > > Attachments: trsa.patch > > > In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs). > In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.