hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently
Date Tue, 27 Sep 2011 04:07:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115184#comment-13115184
] 

Ted Yu commented on HBASE-4492:
-------------------------------

>From output of build 19 above:
{code}
2011-09-25 09:35:06,876 INFO  [RS_CLOSE_REGION-hemera.apache.org,33646,1316943298695-0] regionserver.HRegion(738):
Closed tableRestart,aaaaa,1316943285423.b4692b784743bbe7c57312d8b2f8539d.
2011-09-25 09:35:06,876 DEBUG [RS_CLOSE_REGION-hemera.apache.org,33646,1316943298695-0] handler.CloseRegionHandler(142):
Closed region tableRestart,aaaaa,1316943285423.b4692b784743bbe7c57312d8b2f8539d.
...
2011-09-25 09:35:14,609 DEBUG [Thread-1] zookeeper.ZKAssign(892): ZK RIT -> 70236052
2011-09-25 09:35:14,609 DEBUG [Thread-1] zookeeper.ZKAssign(892): ZK RIT -> 1028785192
...
2011-09-25 09:35:14,710 DEBUG [Thread-1] master.TestRollingRestart(325): 

TRR: Expected to find 22 but only found 3

2011-09-25 09:35:14,711 DEBUG [Thread-1] master.TestRollingRestart(325): 

TRR: Missing region: tableRestart,aaaaa,1316943285423.b4692b784743bbe7c57312d8b2f8539d.
{code}
blockUntilNoRIT() has these calls:
{code}
    ZKAssign.blockUntilNoRIT(zkw);
    master.assignmentManager.waitUntilNoRegionsInTransition(60000);
{code}
We can see that master.assignmentManager.waitUntilNoRegionsInTransition() waited at most 100
ms, far shorter than 60sec limit.
Should we wait longer ? I think using NoRIT criterion alone isn't enough.
                
> TestRollingRestart fails intermittently
> ---------------------------------------
>
>                 Key: HBASE-4492
>                 URL: https://issues.apache.org/jira/browse/HBASE-4492
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Assignee: Jonathan Gray
>
> I got the following when running test suite on TRUNK:
> {code}
> testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  Time elapsed:
300.28 sec  <<< ERROR!
> java.lang.Exception: test timed out after 300000 milliseconds
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
>         at org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
> {code}
> I ran TestRollingRestart#testBasicRollingRestart manually afterwards which wiped out
test output file for the failed test.
> Similar failure can be found on Jenkins:
> https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message