hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-8940) TestRegionMergeTransactionOnCluster#testWholesomeMerge may fail due to race in opening region
Date Sun, 28 Jul 2013 14:09:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721935#comment-13721935
] 

Jimmy Xiang edited comment on HBASE-8940 at 7/28/13 2:09 PM:
-------------------------------------------------------------

The above failed test is caused by
{noformat}
2013-07-27 20:08:48,345 INFO  [MASTER_TABLE_OPERATIONS-quirinus:59057-0] handler.DispatchMergingRegionHandler(170):
Cancel merging regions testWholesomeMerge,,1374955684854.8b452e80a9e15d54bd265c344f4ad953.,
testWholesomeMerge,testRow002
0,1374955684854.4a31455f2b0256853c41c52ba65bdc10., because can't move them together after
30003ms
{noformat}

Timeout of moving regions together on same RS is caused by closing region "4a31455f2b0256853c41c52ba65bdc10"
take more time than 30s,

{noformat}
2013-07-27 20:08:18,753 DEBUG [RS_CLOSE_REGION-quirinus:57626-1] handler.CloseRegionHandler(125):
Processing close of 
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.

2013-07-27 20:08:18,755 DEBUG [RS_CLOSE_REGION-quirinus:57626-1] regionserver.HRegion(1493):
Started memstore flush for testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.,
current region memstore size 3.4k

2013-07-27 20:09:03,914 INFO  [RS_CLOSE_REGION-quirinus:57626-1] regionserver.HRegion(1637):
Finished memstore flush of ~3.4k/3520, currentsize=0.0/0 for region testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc1
0. in 45159ms, sequenceid=186, compaction requested=false

2013-07-27 20:09:03,956 DEBUG [RS_CLOSE_REGION-quirinus:57626-1] handler.CloseRegionHandler(177):
Closed region testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.
{noformat}

>From the above logs, closing region took 45s, it seems flushing is too slow, but can't
get the reason from the current logs, maybe GC or high IO load at that time
                
      was (Author: zjushch):
    The above failed test is caused by
{format}
2013-07-27 20:08:48,345 INFO  [MASTER_TABLE_OPERATIONS-quirinus:59057-0] handler.DispatchMergingRegionHandler(170):
Cancel merging regions testWholesomeMerge,,1374955684854.8b452e80a9e15d54bd265c344f4ad953.,
testWholesomeMerge,testRow002
0,1374955684854.4a31455f2b0256853c41c52ba65bdc10., because can't move them together after
30003ms
{format}

Timeout of moving regions together on same RS is caused by closing region "4a31455f2b0256853c41c52ba65bdc10"
take more time than 30s,

{format}
2013-07-27 20:08:18,753 DEBUG [RS_CLOSE_REGION-quirinus:57626-1] handler.CloseRegionHandler(125):
Processing close of 
testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.

2013-07-27 20:08:18,755 DEBUG [RS_CLOSE_REGION-quirinus:57626-1] regionserver.HRegion(1493):
Started memstore flush for testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.,
current region memstore size 3.4k

2013-07-27 20:09:03,914 INFO  [RS_CLOSE_REGION-quirinus:57626-1] regionserver.HRegion(1637):
Finished memstore flush of ~3.4k/3520, currentsize=0.0/0 for region testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc1
0. in 45159ms, sequenceid=186, compaction requested=false

2013-07-27 20:09:03,956 DEBUG [RS_CLOSE_REGION-quirinus:57626-1] handler.CloseRegionHandler(177):
Closed region testWholesomeMerge,testRow0020,1374955684854.4a31455f2b0256853c41c52ba65bdc10.
{format}

>From the above logs, closing region took 45s, it seems flushing is too slow, but can't
get the reason from the current logs, maybe GC or high IO load at that time
                  
> TestRegionMergeTransactionOnCluster#testWholesomeMerge may fail due to race in opening
region
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8940
>                 URL: https://issues.apache.org/jira/browse/HBASE-8940
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: chunhui shen
>             Fix For: 0.95.2
>
>         Attachments: 8940-trunk-v2.patch, 8940-v1.txt, 8940v3.txt
>
>
> From http://54.241.6.143/job/HBase-TRUNK-Hadoop-2/org.apache.hbase$hbase-server/395/testReport/org.apache.hadoop.hbase.regionserver/TestRegionMergeTransactionOnCluster/testWholesomeMerge/
:
> {code}
> 013-07-11 09:33:44,154 INFO  [AM.ZK.Worker-pool-2-thread-2] master.RegionStates(309):
Offlined 3ffefd878a234031675de6b2c70b2ead from ip-10-174-118-204.us-west-1.compute.internal,60498,1373535184820
> 2013-07-11 09:33:44,154 INFO  [AM.ZK.Worker-pool-2-thread-2] master.AssignmentManager$4(1223):
The master has opened testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.
that was online on ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
> 2013-07-11 09:33:44,182 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1] zookeeper.ZKAssign(862):
regionserver:59210-0x13fcd13a20c0002 Successfully transitioned node 3ffefd878a234031675de6b2c70b2ead
from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
> 2013-07-11 09:33:44,182 INFO  [MASTER_TABLE_OPERATIONS-ip-10-174-118-204:39405-0] handler.DispatchMergingRegionHandler(154):
Failed send MERGE REGIONS RPC to server ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
for region testWholesomeMerge,,1373535210124.efcb10dcfa250e31bfd50dc6c7049f32.,testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.,
focible=false, org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being
opened: 3ffefd878a234031675de6b2c70b2ead
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2566)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3862)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.mergeRegions(HRegionServer.java:3649)
> 	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:14400)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2124)
> 	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1831)
> 2013-07-11 09:33:44,182 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1] handler.OpenRegionHandler(373):
region transitioned to opened in zookeeper: {ENCODED => 3ffefd878a234031675de6b2c70b2ead,
NAME => 'testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.',
STARTKEY => 'testRow0020', ENDKEY => 'testRow0040'}, server: ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
> 2013-07-11 09:33:44,183 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1] handler.OpenRegionHandler(186):
Opened testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead. on server:ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
> {code}
> We can see that MASTER_TABLE_OPERATIONS thread couldn't get region 3ffefd878a234031675de6b2c70b2ead
because RS_OPEN_REGION thread finished region opening 1 millisecond later.
> One solution is to retry operation when receiving RegionOpeningException

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message