hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12166) TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
Date Sat, 04 Oct 2014 01:05:36 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158839#comment-14158839
] 

Hudson commented on HBASE-12166:
--------------------------------

FAILURE: Integrated in HBase-1.0 #273 (See [https://builds.apache.org/job/HBase-1.0/273/])
HBASE-12166 TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork (jxiang: rev 12fd6d2a2495533726d84759fd6cc754b5f7eb97)
* hbase-server/src/main/java/org/apache/hadoop/hbase/coordination/ZKSplitLogManagerCoordination.java


> TestDistributedLogSplitting.testMasterStartsUpWithLogReplayWork
> ---------------------------------------------------------------
>
>                 Key: HBASE-12166
>                 URL: https://issues.apache.org/jira/browse/HBASE-12166
>             Project: HBase
>          Issue Type: Bug
>          Components: test, wal
>            Reporter: stack
>            Assignee: Jimmy Xiang
>             Fix For: 2.0.0, 0.99.1
>
>         Attachments: 12166.txt, hbase-12166.patch, hbase-12166_v2.patch, log.txt
>
>
> See https://builds.apache.org/job/PreCommit-HBASE-Build/11204//testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testMasterStartsUpWithLogReplayWork/
> The namespace region gets stuck.  It is never 'recovered' even though we have finished
log splitting.  Here is the main exception:
> {code}
> 4941 2014-10-03 02:00:36,862 DEBUG [B.defaultRpcServer.handler=1,queue=0,port=37113]
ipc.CallRunner(111): B.defaultRpcServer.handler=1,queue=0,port=37113: callId: 211 service:
ClientService methodName: Get
>       size: 99 connection: 67.195.81.144:44526
> 4942 org.apache.hadoop.hbase.exceptions.RegionInRecoveryException: hbase:namespace,,1412301462277.eba5d23de65f2718715eeb22edf7edc2.
is recovering
> 4943   at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:6058)
> 4944   at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2086)
> 4945   at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2072)
> 4946   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:5014)
> 4947   at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4988)
> 4948   at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:1690)
> 4949   at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:30418)
> 4950   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2020)
> 4951   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> 4952   at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
> 4953   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
> 4954   at java.lang.Thread.run(Thread.java:744)  
> {code}
> See how we've finished log splitting long time previous:
> {code}
> 2014-10-03 01:57:48,129 INFO  [M_LOG_REPLAY_OPS-asf900:37113-1] master.SplitLogManager(294):
finished splitting (more than or equal to) 197337 bytes in 1 log files in [hdfs://localhost:49601/user/jenkins/hbase/WALs/asf900.gq1.ygridcore.net,40732,1412301461887-splitting]
in 379ms
> {code}
> If I grep for the deleting of znodes on recovery, which is when we set the recovering
flag to false, I see a bunch of regions but not my namespace one:
> 2014-10-03 01:57:47,330 INFO  [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
/hbase/recovering-regions/1588230740 znode deleted. Region: 1588230740 completes recovery.
> 2014-10-03 01:57:48,119 INFO  [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
/hbase/recovering-regions/adfdcf958dd958f0e2ce59072ce2209d znode deleted. Region: adfdcf958dd958f0e2ce59072ce2209d
completes recovery.
> 2014-10-03 01:57:48,121 INFO  [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
/hbase/recovering-regions/41d438848305831b61d708a406d5ecde znode deleted. Region: 41d438848305831b61d708a406d5ecde
completes recovery.
> 2014-10-03 01:57:48,122 INFO  [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
/hbase/recovering-regions/6a7cada80de2ae5d774fe8cd33bd4cda znode deleted. Region: 6a7cada80de2ae5d774fe8cd33bd4cda
completes recovery.
> 2014-10-03 01:57:48,124 INFO  [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
/hbase/recovering-regions/65451bd5b38bd16a31e25b62b3305533 znode deleted. Region: 65451bd5b38bd16a31e25b62b3305533
completes recovery.
> 2014-10-03 01:57:48,125 INFO  [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
/hbase/recovering-regions/07afdc3748894cf2b56e0075272a95a0 znode deleted. Region: 07afdc3748894cf2b56e0075272a95a0
completes recovery.
> 2014-10-03 01:57:48,126 INFO  [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
/hbase/recovering-regions/a4337ad2874ee7e599ca2344fce21583 znode deleted. Region: a4337ad2874ee7e599ca2344fce21583
completes recovery.
> 2014-10-03 01:57:48,128 INFO  [Thread-9216-EventThread] zookeeper.RecoveringRegionWatcher(66):
/hbase/recovering-regions/9d91d6eafe260ce33e8d7d23ccd13192 znode deleted. Region: 9d91d6eafe260ce33e8d7d23ccd13192
completes recovery.
> This would seem to indicate that we successfully wrote zk that we are recovering:
> {code}
> 2014-10-03 01:57:47,672 DEBUG [MASTER_SERVER_OPERATIONS-asf900:37113-0] coordination.ZKSplitLogManagerCoordination(647):
Mark region eba5d23de65f2718715eeb22edf7edc2 recovering from failed region server asf900.gq1.ygridcore.net,42071,1412301461790
> {code}
> As part of the open of namespace, we updated our last flushed id successfully:
> {code}
> 2ae5d774fe8cd33bd4cda/family
> 2014-10-03 01:57:47,698 DEBUG [Thread-9216-EventThread] zookeeper.ZooKeeperWatcher(304):
master:37113-0x148d3b7d84b020f, quorum=localhost:50702, baseZNode=/hbase Received ZooKeeper
Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/recovering-regions/eba5d23de65f2718715eeb22edf7edc2
> 2014-10-03 01:57:47,699 DEBUG [PostOpenDeployTasks:eba5d23de65f2718715eeb22edf7edc2]
regionserver.HRegionServer(2947): Update last flushed sequence id of region eba5d23de65f2718715eeb22edf7edc2
for asf900.gq1.ygridcore.net,42071,1412301461790
> {code}
> Not sure why this node is up in zk.
> Let me dig again later.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message