hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-9812) Intermittent TestSplitLogManager#testMultipleResubmits failure
Date Wed, 23 Oct 2013 22:52:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-9812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803498#comment-13803498
] 

Ted Yu commented on HBASE-9812:
-------------------------------

Looks like there was some issue with zookeeper. Following the above log I saw:
{code}
2013-10-21 11:52:24,809 INFO  [dummy-master,1,1.splitLogManagerTimeoutMonitor] hbase.Chore(93):
dummy-master,1,1.splitLogManagerTimeoutMonitor exiting
2013-10-21 11:52:24,908 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(310):
split-log-manager-tests145fa180-5cc5-4165-9d67-4073ab3f921b-0x141dadbb9320000 Received ZooKeeper
Event, type=None, state=Disconnected, path=null
2013-10-21 11:52:24,908 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(392):
split-log-manager-tests145fa180-5cc5-4165-9d67-4073ab3f921b-0x141dadbb9320000 Received Disconnected
from ZooKeeper, ignoring
2013-10-21 11:52:33,628 ERROR [SyncThread:0] server.NIOServerCnxn(180): Unexpected Exception:

java.nio.channels.CancelledKeyException
	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
	at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
	at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
	at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
	at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
	at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
	at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
2013-10-21 11:52:33,629 INFO  [pool-1-thread-1] zookeeper.MiniZooKeeperCluster(249): Shutdown
MiniZK cluster with all ZK servers
2013-10-21 11:52:33,641 INFO  [pool-1-thread-1] hbase.ResourceChecker(171): after: master.TestSplitLogManager#testMultipleResubmits
Thread=9 (was 8)
Potentially hanging thread: pool-1-thread-1-SendThread(localhost:64465)
	java.lang.Thread.sleep(Native Method)
	org.apache.zookeeper.client.StaticHostProvider.next(StaticHostProvider.java:86)
	org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:937)
	org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:993)
 - Thread LEAK? -, OpenFileDescriptor=143 (was 137) - OpenFileDescriptor LEAK? -, MaxFileDescriptor=4096
(was 4096), SystemLoadAverage=595 (was 572) - SystemLoadAverage LEAK? -, ProcessCount=124
(was 124), AvailableMemoryMB=641 (was 751), ConnectionCount=0 (was 0)
{code}
I wonder if SplitLogManager should utilize RecoverableZooKeeper#setData():
{code}
  public Stat setData(String path, byte[] data, int version)
{code}

> Intermittent TestSplitLogManager#testMultipleResubmits failure
> --------------------------------------------------------------
>
>                 Key: HBASE-9812
>                 URL: https://issues.apache.org/jira/browse/HBASE-9812
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Priority: Minor
>
> From https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/801/testReport/org.apache.hadoop.hbase.master/TestSplitLogManager/testMultipleResubmits/
:
> {code}
> junit.framework.AssertionFailedError: Waiting timed out after [9,600] msec
> 	at junit.framework.Assert.fail(Assert.java:57)
> 	at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:193)
> 	at org.apache.hadoop.hbase.Waiter.waitFor(Waiter.java:146)
> 	at org.apache.hadoop.hbase.HBaseTestingUtility.waitFor(HBaseTestingUtility.java:3220)
> 	at org.apache.hadoop.hbase.master.TestSplitLogManager.waitForCounter(TestSplitLogManager.java:164)
> 	at org.apache.hadoop.hbase.master.TestSplitLogManager.waitForCounter(TestSplitLogManager.java:157)
> 	at org.apache.hadoop.hbase.master.TestSplitLogManager.testMultipleResubmits(TestSplitLogManager.java:280)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ...
> 2013-10-21 11:52:15,148 DEBUG [pool-1-thread-1] zookeeper.ZKUtil(430): split-log-manager-tests145fa180-5cc5-4165-9d67-4073ab3f921b-0x141dadbb9320000
Set watcher on znode that does not yet exist, /hbase/splitWAL/foo%2F1
> 2013-10-21 11:52:15,148 DEBUG [pool-1-thread-1] master.TestSplitLogManager(186): waiting
for task node creation
> 2013-10-21 11:52:15,164 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(310):
split-log-manager-tests145fa180-5cc5-4165-9d67-4073ab3f921b-0x141dadbb9320000 Received ZooKeeper
Event, type=NodeCreated, state=SyncConnected, path=/hbase/splitWAL/foo%2F1
> 2013-10-21 11:52:15,164 DEBUG [pool-1-thread-1-EventThread] regionserver.TestMasterAddressManager$NodeCreationListener(107):
nodeCreated(/hbase/splitWAL/foo%2F1)
> 2013-10-21 11:52:15,164 DEBUG [pool-1-thread-1] master.TestSplitLogManager(188): task
created
> 2013-10-21 11:52:15,164 DEBUG [pool-1-thread-1-EventThread] master.SplitLogManager(711):
put up splitlog task at znode /hbase/splitWAL/foo%2F1
> 2013-10-21 11:52:15,166 DEBUG [pool-1-thread-1-EventThread] master.SplitLogManager(753):
task not yet acquired /hbase/splitWAL/foo%2F1 ver = 0
> 2013-10-21 11:52:15,193 DEBUG [pool-1-thread-1-EventThread] zookeeper.ZooKeeperWatcher(310):
split-log-manager-tests145fa180-5cc5-4165-9d67-4073ab3f921b-0x141dadbb9320000 Received ZooKeeper
Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/splitWAL/foo%2F1
> 2013-10-21 11:52:15,194 INFO  [pool-1-thread-1] hbase.Waiter(174): Waiting up to [3,200]
milli-secs(wait.for.ratio=[1])
> 2013-10-21 11:52:15,194 INFO  [pool-1-thread-1-EventThread] master.SplitLogManager(826):
task /hbase/splitWAL/foo%2F1 acquired by worker1,1,1
> 2013-10-21 11:52:15,204 INFO  [pool-1-thread-1] hbase.Waiter(174): Waiting up to [9,600]
milli-secs(wait.for.ratio=[1])
> 2013-10-21 11:52:15,247 INFO  [dummy-master,1,1.splitLogManagerTimeoutMonitor] master.SplitLogManager$TimeoutMonitor(1408):
total tasks = 1 unassigned = 0 tasks={/hbase/splitWAL/foo%2F1=last_update = 1382356335195
last_version = 1 cur_worker_name = worker1,1,1 status = in_progress incarnation = 0 resubmits
= 0 batch = installed = 1 done = 0 error = 0}
> 2013-10-21 11:52:20,250 INFO  [dummy-master,1,1.splitLogManagerTimeoutMonitor] master.SplitLogManager$TimeoutMonitor(1408):
total tasks = 1 unassigned = 0 tasks={/hbase/splitWAL/foo%2F1=last_update = 1382356335195
last_version = 1 cur_worker_name = worker1,1,1 status = in_progress incarnation = 0 resubmits
= 0 batch = installed = 1 done = 0 error = 0}
> 2013-10-21 11:52:21,251 INFO  [dummy-master,1,1.splitLogManagerTimeoutMonitor] master.SplitLogManager(875):
resubmitting task /hbase/splitWAL/foo%2F1
> 2013-10-21 11:52:24,808 DEBUG [dummy-master,1,1.splitLogManagerTimeoutMonitor] zookeeper.ZooKeeperWatcher(458):
split-log-manager-tests145fa180-5cc5-4165-9d67-4073ab3f921b-0x141dadbb9320000 Received InterruptedException,
doing nothing here
> java.lang.InterruptedException
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:485)
>   at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1309)
>   at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1264)
>   at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:414)
>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:881)
>   at org.apache.hadoop.hbase.master.SplitLogManager.resubmit(SplitLogManager.java:880)
>   at org.apache.hadoop.hbase.master.SplitLogManager.access$400(SplitLogManager.java:107)
>   at org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor.chore(SplitLogManager.java:1400)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:80)
>   at java.lang.Thread.run(Thread.java:662)
> 2013-10-21 11:52:24,809 DEBUG [dummy-master,1,1.splitLogManagerTimeoutMonitor] master.SplitLogManager(881):
failed to resubmit task /hbase/splitWAL/foo%2F1 version changed
> {code}
> Looks like the version change caused some resubmission to fail.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message