hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chunhui shen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7299) TestMultiParallel fails intermittently in trunk builds
Date Mon, 31 Dec 2012 07:48:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13541312#comment-13541312
] 

chunhui shen commented on HBASE-7299:
-------------------------------------

[~ted_yu]
I have see the log again.
And I think it's because of balance

First, see the order of test:
{code}
2012-12-31 03:11:48,688 INFO  [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testActiveThreadsCount

2012-12-31 03:11:49,247 INFO  [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testBatchWithGet

2012-12-31 03:11:50,151 INFO  [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testBadFam

2012-12-31 03:11:50,169 INFO  [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testFlushCommitsNoAbort

2012-12-31 03:11:50,825 INFO  [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testFlushCommitsWithAbort

{code}

Therefore, We only need to take care what happen before 2012-12-31 03:11:50,825


Then, I grep all the opened region logs
{code}
2012-12-31 03:11:46,309 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0]
handler.OpenRegionHandler(149): Opened multi_test_table,,1356923505778.5e876dba9be19501a1eb65bf3a169e52.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,164 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,bbb,1356923506859.7c3f09396e7314de6f5a757b010b6497.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,202 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0]
handler.OpenRegionHandler(149): Opened multi_test_table,ccc,1356923506862.2a80b82e2d6c3152e3f12bc91e1cc621.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,303 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-1]
handler.OpenRegionHandler(149): Opened multi_test_table,fff,1356923506868.63ffa8986cd30ff5314b4c2a70cf846a.
on server:asf001.sp2.ygridcore.net,45800,1356923500558
2012-12-31 03:11:47,329 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,ddd,1356923506864.744510f09d963e39dd9c0b6e3119dc10.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,370 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-2]
handler.OpenRegionHandler(149): Opened multi_test_table,iii,1356923506875.d09ca7b9b80b6cde560772598a240d0e.
on server:asf001.sp2.ygridcore.net,45800,1356923500558
2012-12-31 03:11:47,400 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0]
handler.OpenRegionHandler(149): Opened multi_test_table,eee,1356923506866.6a1697e740f121d009c3085e0cccd18d.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,439 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-0]
handler.OpenRegionHandler(149): Opened multi_test_table,jjj,1356923506878.f25b9086263fb7a4f983524c708503b6.
on server:asf001.sp2.ygridcore.net,45800,1356923500558
2012-12-31 03:11:47,465 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1]
handler.OpenRegionHandler(149): Opened multi_test_table,,1356923506856.2db538d9e2005dba4e28746d51cf3831.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,482 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,ggg,1356923506871.7adeba3045bdbb0f4e499b221d2ffc87.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,598 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-1]
handler.OpenRegionHandler(149): Opened multi_test_table,nnn,1356923506888.9cc1e013ebfba7da8e00e4963c2d111a.
on server:asf001.sp2.ygridcore.net,45800,1356923500558
2012-12-31 03:11:47,603 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1]
handler.OpenRegionHandler(149): Opened multi_test_table,kkk,1356923506880.a2a3e39af3fa95eb1a3979998b075bb6.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,634 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-2]
handler.OpenRegionHandler(149): Opened multi_test_table,ppp,1356923506893.915969809cfe733d325591b7c27bd088.
on server:asf001.sp2.ygridcore.net,45800,1356923500558
2012-12-31 03:11:47,643 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,lll,1356923506883.0e6b1c9b373cecb0c74380b78d1cc492.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,701 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-0]
handler.OpenRegionHandler(149): Opened multi_test_table,rrr,1356923506899.67925003b24f6408e7ee6ef2360a77f6.
on server:asf001.sp2.ygridcore.net,45800,1356923500558
2012-12-31 03:11:47,717 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1]
handler.OpenRegionHandler(149): Opened multi_test_table,mmm,1356923506886.d0a07239a287e74e7706e4b9a0c9f491.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,745 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,ooo,1356923506891.524c6a4fb529fbb5b86e0865ac0131f5.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,867 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,sss,1356923506901.af5693d7dc46541210d7c26cf4e4c1a0.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,936 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0]
handler.OpenRegionHandler(149): Opened multi_test_table,hhh,1356923506873.12dff64cde2a448c9d5b7adecfabfaaa.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:47,957 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,ttt,1356923506904.c121cfbfb3e248f820d4729e4452ff14.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:48,012 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,vvv,1356923506908.797a80f1a86a9256a833e4cd48554185.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:48,076 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,www,1356923506911.79129a00e6718ae7ca478e3dde854524.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:48,185 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1]
handler.OpenRegionHandler(149): Opened multi_test_table,qqq,1356923506896.cf9a88d3961133afeaaeabdf5a9cffc3.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:48,411 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0]
handler.OpenRegionHandler(149): Opened multi_test_table,uuu,1356923506906.62d60488f81f0e0edce10369200b1543.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:48,556 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2]
handler.OpenRegionHandler(149): Opened multi_test_table,xxx,1356923506913.bf54cd9fae68060237f700e0c7acc6b4.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
2012-12-31 03:11:48,626 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1]
handler.OpenRegionHandler(149): Opened multi_test_table,yyy,1356923506915.7b41bd006a2c832842a67b85f1837c68.
on server:asf001.sp2.ygridcore.net,38198,1356923500609
{code}

These regions are created by
{code}
 @BeforeClass public static void beforeClass() throws Exception {
...
    UTIL.createMultiRegions(t, Bytes.toBytes(FAMILY));
...
  }
{code}

>From the above, we could see server:asf001.sp2.ygridcore.net,38198,1356923500609 serve
20 regions, and asf001.sp2.ygridcore.net,45800,1356923500558 only serve 6 regions.
It seems clear:
{code}
for (JVMClusterUtil.RegionServerThread t: liveRSs) {
      int regions = ProtobufUtil.getOnlineRegions(t.getRegionServer()).size();
      Assert.assertTrue("Count of regions=" + regions, regions > 10);
    }
{code}
I don't know why we assert regions more than 10 for each regionserver.
>From the failed logs, "java.lang.AssertionError: Count of regions=7", there is another
region on asf001.sp2.ygridcore.net,45800,1356923500558
{code}
2012-12-31 03:11:44,306 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-0]
handler.OpenRegionHandler(149): Opened -ROOT-,,0.70236052 on server:asf001.sp2.ygridcore.net,45800,1356923500558
{code}
Yes, It's the -ROOT- region.

Also, we could see the balance logs later
{code}
2012-12-31 03:11:58,883 INFO  [pool-1-thread-1] master.HMaster(1325): balance hri=multi_test_table,mmm,1356923506886.d0a07239a287e74e7706e4b9a0c9f491.,
src=asf001.sp2.ygridcore.net,38198,1356923500609, dest=asf001.sp2.ygridcore.net,59241,1356923517635
2012-12-31 03:11:58,890 INFO  [pool-1-thread-1] master.HMaster(1325): balance hri=multi_test_table,,1356923505778.5e876dba9be19501a1eb65bf3a169e52.,
src=asf001.sp2.ygridcore.net,38198,1356923500609, dest=asf001.sp2.ygridcore.net,59241,1356923517635
2012-12-31 03:11:58,949 INFO  [pool-1-thread-1] master.HMaster(1325): balance hri=multi_test_table,bbb,1356923506859.7c3f09396e7314de6f5a757b010b6497.,
src=asf001.sp2.ygridcore.net,38198,1356923500609, dest=asf001.sp2.ygridcore.net,59241,1356923517635
2012-12-31 03:11:58,967 INFO  [pool-1-thread-1] master.HMaster(1325): balance hri=multi_test_table,eee,1356923506866.6a1697e740f121d009c3085e0cccd18d.,
src=asf001.sp2.ygridcore.net,38198,1356923500609, dest=asf001.sp2.ygridcore.net,59241,1356923517635
{code}



So, I think the reason is unbalanced regions on the servers at before, And I don't think it's
necessary that assert regions more than 10 for each regionserver.

By the way, I find we will abort regionserver 0 in TestMultiParallel#testBatchWithPut, however
we will also abort regionserver 0 TestMultiParallel#testFlushCommitsWithAbort(). It seems
confused.
                
> TestMultiParallel fails intermittently in trunk builds
> ------------------------------------------------------
>
>                 Key: HBASE-7299
>                 URL: https://issues.apache.org/jira/browse/HBASE-7299
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: chunhui shen
>            Priority: Critical
>             Fix For: 0.96.0
>
>         Attachments: 7299-v4.txt, HBASE-7299.patch, HBASE-7299v2.patch, HBASE-7299v3.patch
>
>
> From trunk build #3598:
> {code}
>  testFlushCommitsNoAbort(org.apache.hadoop.hbase.client.TestMultiParallel): Count of
regions=8
> {code}
> It failed in 3595 as well:
> {code}
> java.lang.AssertionError: Server count=2, abort=true expected:<1> but was:<2>
> 	at org.junit.Assert.fail(Assert.java:93)
> 	at org.junit.Assert.failNotEquals(Assert.java:647)
> 	at org.junit.Assert.assertEquals(Assert.java:128)
> 	at org.junit.Assert.assertEquals(Assert.java:472)
> 	at org.apache.hadoop.hbase.client.TestMultiParallel.doTestFlushCommits(TestMultiParallel.java:267)
> 	at org.apache.hadoop.hbase.client.TestMultiParallel.testFlushCommitsWithAbort(TestMultiParallel.java:226)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message