hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9724) Failed region split is not handled correctly by AM
Date Tue, 08 Oct 2013 21:18:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13789711#comment-13789711
] 

Hadoop QA commented on HBASE-9724:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12607398/hbase-9724_v1.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 3 new or modified
tests.

    {color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 1.0 profile.

    {color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 2.0 profile.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

    {color:red}-1 site{color}.  The patch appears to cause mvn site goal to fail.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7491//console

This message is automatically generated.

> Failed region split is not handled correctly by AM
> --------------------------------------------------
>
>                 Key: HBASE-9724
>                 URL: https://issues.apache.org/jira/browse/HBASE-9724
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: hbase-9724_v1.patch
>
>
> We ran into a case with ITLoadAndVerify with CM which left a region in RIT state, because
the region failed a region SPLIT, but AM did not process this correctly. 
> Region server attempting SPLIT and rollover: 
> {code}
> 2013-10-06 16:42:56,959 INFO  [Priority.RpcServer.handler=0,port=60020] regionserver.HRegionServer:
Splitting IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.
> ...
> 2013-10-06 16:43:01,480 INFO  [regionserver60020-splits-1381102351302] regionserver.SplitTransaction:
Starting split of region IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.
> 2013-10-06 16:43:01,480 DEBUG [regionserver60020-splits-1381102351302] regionserver.SplitTransaction:
regionserver:60020-0x1418eb9f38301bb Creating ephemeral node for 9608be62536a61444766da669e67d39e
in SPLITTING state
> 2013-10-06 16:43:01,570 DEBUG [regionserver60020-splits-1381102351302] zookeeper.ZKAssign:
regionserver:60020-0x1418eb9f38301bb Transitioning 9608be62536a61444766da669e67d39e from RS_ZK_REGION_SPLITTING
to RS_ZK_REGION_SPLITTING
> 2013-10-06 16:43:01,607 DEBUG [regionserver60020-splits-1381102351302] zookeeper.ZKAssign:
regionserver:60020-0x1418eb9f38301bb Transitioned node 9608be62536a61444766da669e67d39e from
RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING
> 2013-10-06 16:43:01,607 DEBUG [regionserver60020-splits-1381102351302] zookeeper.ZKAssign:
regionserver:60020-0x1418eb9f38301bb Transitioning 9608be62536a61444766da669e67d39e from RS_ZK_REGION_SPLITTING
to RS_ZK_REGION_SPLITTING
> 2013-10-06 16:43:01,629 DEBUG [regionserver60020-splits-1381102351302] zookeeper.ZKAssign:
regionserver:60020-0x1418eb9f38301bb Transitioned node 9608be62536a61444766da669e67d39e from
RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING
> 2013-10-06 16:43:02,047 DEBUG [regionserver60020-splits-1381102351302] regionserver.HRegion:
Closing IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.:
disabling compactions & flushes
> 2013-10-06 16:43:02,047 DEBUG [regionserver60020-splits-1381102351302] regionserver.HRegion:
Updates disabled for region IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.
> 2013-10-06 16:43:02,047 DEBUG [regionserver60020-splits-1381102351302] regionserver.HRegion:
Started memstore flush for IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.,
current region memstore size 4.4 M
> 2013-10-06 16:43:03,945 INFO  [regionserver60020-splits-1381102351302] regionserver.HRegion:
Finished memstore flush of ~4.4 M/4607072, currentsize=0/0 for region IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.
in 1898ms, sequenceid=6703616, compaction requested=true
> 2013-10-06 16:43:04,026 INFO  [StoreCloserThread-IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.-1]
regionserver.HStore: Closed f1
> 2013-10-06 16:43:04,027 INFO  [regionserver60020-splits-1381102351302] regionserver.HRegion:
Closed IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.
> 2013-10-06 16:43:36,938 INFO  [regionserver60020-splits-1381102351302] regionserver.SplitRequest:
Running rollback/cleanup of failed split of IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.;
Took too long to split the files and create the references, aborting split
> java.io.IOException: Took too long to split the files and create the references, aborting
split
>   at org.apache.hadoop.hbase.regionserver.SplitTransaction.splitStoreFiles(SplitTransaction.java:584)
>   at org.apache.hadoop.hbase.regionserver.SplitTransaction.createDaughters(SplitTransaction.java:296)
>   at org.apache.hadoop.hbase.regionserver.SplitTransaction.execute(SplitTransaction.java:465)
>   at org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:82)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> 2013-10-06 16:43:37,482 INFO  [StoreOpener-9608be62536a61444766da669e67d39e-1] compactions.CompactionConfiguration:
size [134217728, 9223372036854775807); files [3, 10); ratio 1.200000; off-peak ratio 5.000000;
throttle point 2684354560; delete expired; major period 604800000, major jitter 0.500000
> 2013-10-06 16:43:37,509 DEBUG [StoreOpener-9608be62536a61444766da669e67d39e-1] regionserver.HStore:
loaded hdfs://ambari-unsec-1381055459-hbase-1.cs1cloud.internal:8020/apps/hbase/data/data/default/IntegrationTestLoadAndVerify/9608be62536a61444766da669e67d39e/f1/12cecfd69b7948eb8bbf9bacc3d7fbb0,
isReference=false, isBulkLoadResult=false, seqid=6703616, majorComp
> 2013-10-06 16:43:37,549 DEBUG [StoreOpener-9608be62536a61444766da669e67d39e-1] regionserver.HStore:
loaded hdfs://ambari-unsec-1381055459-hbase-1.cs1cloud.internal:8020/apps/hbase/data/data/default/IntegrationTestLoadAndVerify/9608be62536a61444766da669e67d39e/f1/6b715040df9b424cbd3d01f36b34677b,
isReference=false, isBulkLoadResult=false, seqid=6703217, majorComp
> 2013-10-06 16:43:37,588 DEBUG [StoreOpener-9608be62536a61444766da669e67d39e-1] regionserver.HStore:
loaded hdfs://ambari-unsec-1381055459-hbase-1.cs1cloud.internal:8020/apps/hbase/data/data/default/IntegrationTestLoadAndVerify/9608be62536a61444766da669e67d39e/f1/97cbd5f18ee64e0aaf43f48b1d3fae73,
isReference=false, isBulkLoadResult=false, seqid=6681405, majorComp
> 2013-10-06 16:43:37,630 DEBUG [StoreOpener-9608be62536a61444766da669e67d39e-1] regionserver.HStore:
loaded hdfs://ambari-unsec-1381055459-hbase-1.cs1cloud.internal:8020/apps/hbase/data/data/default/IntegrationTestLoadAndVerify/9608be62536a61444766da669e67d39e/f1/a0d33ad9786d4a6fb6fa83b530cb5acd,
isReference=false, isBulkLoadResult=false, seqid=6699746, majorComp
> 2013-10-06 16:43:37,676 DEBUG [StoreOpener-9608be62536a61444766da669e67d39e-1] regionserver.HStore:
loaded hdfs://ambari-unsec-1381055459-hbase-1.cs1cloud.internal:8020/apps/hbase/data/data/default/IntegrationTestLoadAndVerify/9608be62536a61444766da669e67d39e/f1/c4ad2148624d42e6b8408b9b257a4c70,
isReference=false, isBulkLoadResult=false, seqid=6689091, majorComp
> 2013-10-06 16:43:37,893 INFO  [regionserver60020-splits-1381102351302] regionserver.HRegionFileSystem:
Cleaned up old failed split transaction detritus: hdfs://ambari-unsec-1381055459-hbase-1.cs1cloud.internal:8020/apps/hbase/data/data/default/IntegrationTestLoadAndVerify/9608be62536a61444766da669e67d39e/.splits
> 2013-10-06 16:43:37,937 INFO  [regionserver60020-splits-1381102351302] regionserver.HRegion:
Onlined 9608be62536a61444766da669e67d39e; next sequenceid=6703617
> 2013-10-06 16:43:38,021 DEBUG [regionserver60020-splits-1381102351302] zookeeper.ZKAssign:
regionserver:60020-0x1418eb9f38301bb Deleted unassigned node 9608be62536a61444766da669e67d39e
in expected state RS_ZK_REGION_SPLITTING
> 2013-10-06 16:43:38,021 INFO  [regionserver60020-splits-1381102351302] regionserver.SplitRequest:
Successful rollback of failed split of IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.
> {code}
> The RS, 4 mins later will try to do another split, but it will fail to let master know
so, because master still thinks the region is in RIT due to the first split. 
> {code}
> 2013-10-06 16:43:01,606 DEBUG [AM.ZK.Worker-pool2-t202] master.AssignmentManager: Handling
RS_ZK_REGION_SPLITTING, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,60020,1381101889371,
region=9608be62536a61444766da669e67d39e, current_state=null
> 2013-10-06 16:43:01,609 INFO  [AM.ZK.Worker-pool2-t202] master.RegionStates: Transitioned
{9608be62536a61444766da669e67d39e state=OPEN, ts=1381102229682, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,60020,1381101889371}
to {9608be62536a61444766da669e67d39e state=SPLITTING, ts=1381102981609, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,600
> 2013-10-06 16:43:01,634 DEBUG [AM.ZK.Worker-pool2-t203] master.AssignmentManager: Handling
RS_ZK_REGION_SPLITTING, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,60020,1381101889371,
region=9608be62536a61444766da669e67d39e, current_state={9608be62536a61444766da669e67d39e state=SPLITTING,
ts=1381102981609, server=ambari-unsec-1381055459-hbase-5.cs1cloud
> 2013-10-06 16:43:38,019 DEBUG [AM.ZK.Worker-pool2-t204] master.AssignmentManager: Znode
IntegrationTestLoadAndVerify,/(k\xCA\x1A\xF2\x86\xB6,1381100194019.9608be62536a61444766da669e67d39e.
deleted, state: {9608be62536a61444766da669e67d39e state=SPLITTING, ts=1381102981634, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,60020,1381101889371}
> 2013-10-06 16:44:35,673 DEBUG [AM.ZK.Worker-pool2-t206] master.AssignmentManager: Handling
RS_ZK_REGION_SPLITTING, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,60020,1381101889371,
region=ae22b90a3ddeb30c45c5735f7c855014, current_state=null
> 2013-10-06 16:44:35,673 INFO  [AM.ZK.Worker-pool2-t206] master.RegionStates: Transitioned
{ae22b90a3ddeb30c45c5735f7c855014 state=OPEN, ts=1381102362273, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,60020,1381101889371}
to {ae22b90a3ddeb30c45c5735f7c855014 state=SPLITTING, ts=1381103075673, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,600
> 2013-10-06 16:44:35,725 DEBUG [AM.ZK.Worker-pool2-t207] master.AssignmentManager: Handling
RS_ZK_REGION_SPLITTING, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,60020,1381101889371,
region=ae22b90a3ddeb30c45c5735f7c855014, current_state={ae22b90a3ddeb30c45c5735f7c855014 state=SPLITTING,
ts=1381103075673, server=ambari-unsec-1381055459-hbase-5.cs1cloud
> 2013-10-06 16:44:38,707 DEBUG [ambari-unsec-1381055459-hbase-1.cs1cloud.internal,60000,1381101554357-BalancerChore]
master.HMaster: Not running balancer because 2 region(s) in transition: {9608be62536a61444766da669e67d39e={9608be62536a61444766da669e67d39e
state=SPLITTING, ts=1381102981634, server=ambari-unsec-1381055459-hbase-5.cs1cloud.internal,60020,1381101889
> {code}
> It seems that AM.nodeDeleted() does not revert the in-memory state for the region to
be opened. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message