Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hadoop-dev@lucene.apache.org
Message-ID: <17794083.1191961010607.JavaMail.jira@brutus>
Date: Tue, 9 Oct 2007 13:16:50 -0700 (PDT)
From: "stack (JIRA)" <jira@apache.org>
To: hadoop-dev@lucene.apache.org
Subject: [jira] Updated: (HADOOP-2017) [hbase] TestRegionServerAbort failure
 in patch build #903 and nightly #266
In-Reply-To: <25718086.1191959571066.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HADOOP-2017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HADOOP-2017:
--------------------------

    Attachment: trsa.patch

A patch w/ more logging and thread dumping to better help what is going on, and a mechanism that notices moved regions sooner.

{code}
HADOOP-2017 TestRegionServerAbort failure in patch build #903 and nightly #266

Notice moved META regions sooner.   Also added more logging and
thread dumping once a minute when test starts to take too long
so can see where we are hung (if we are hung).

M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestHStoreFile.java
    Inherit from HBaseTestCase.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/HBaseClusterTestCase.java
    (threadDumpingJoin): Added.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestRegionServerAbort.java
    Run verification in its own thread so can concurrently thread dump if
    test is going on too long.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/DFSAbort.java
    Moved join up into parent class.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/Chore.java
    Remove unused import.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
    (MetaRegion.toString): Added.
    Added logging around assignment checking and log split.
    (MetaRegion.compareTo): Add consideration of server address.
    (numberOfMetaRegions, metaRegionsToScan, onlineMetaRegions):
      Put declaration and assignment together and made final.
    (scanOneMetaRegion): If the region is no longer in onlineMetaRegions,
    give up trying to scan.
    (unassignRootRegion): Added (Not yet finished).
{code}

> [hbase] TestRegionServerAbort failure in patch build #903 and nightly #266
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2017
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2017
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
>             Fix For: 0.15.0
>
>         Attachments: trsa.patch
>
>
> In patch build #903, the metascanner keeps trying to go to the downed server even though onlineMetaRegions has been updated w/ new location and then the metascanner just goes away (or hangs).
> In nightly build #266, its a similar scenario only the remaining region servers decide to shut down because they haven't been able to reach the master in 7 seconds.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.