accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1861) MetadataSplitIT test failed
Date Thu, 07 Nov 2013 16:25:19 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13816098#comment-13816098
] 

Eric Newton commented on ACCUMULO-1861:
---------------------------------------

The test passes when run on it's own.  So, I increased the timeout to 20 minutes and ran it
in parallel with all the other ITs.

It hung up again, and I was able to get some jstacks:

Master
{noformat}
"batch scanner 2- 3 looking up 1 ranges at hostname:56371" daemon prio=10 tid=0x00007f6788002800
nid=0x90ec runnable [0x00007f67f515d000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        - locked <0x00000000f4548160> (a sun.nio.ch.Util$2)
        - locked <0x00000000f4548178> (a java.util.Collections$UnmodifiableSet)
        - locked <0x00000000f4547d70> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        - locked <0x00000000f3ea0180> (a java.io.BufferedInputStream)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:256)
        at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
        at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:311)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:291)
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:650)
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:364)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

This is the batch scanner that watching the Metadata tablets:
{noformat}
"Watching Metadata Tablets" daemon prio=10 tid=0x00007f67dc191000 nid=0x4aaf waiting on condition
[0x00007f67f5663000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000000f440f520> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025)
        at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:340)
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.hasNext(TabletServerBatchReaderIterator.java:203)
        - locked <0x00000000f440f550> (a java.lang.Object)
        at org.apache.accumulo.server.master.state.MetaDataTableScanner.hasNext(MetaDataTableScanner.java:116)
        at org.apache.accumulo.master.TabletGroupWatcher.run(TabletGroupWatcher.java:143)
{noformat}

Looking at the tablet server on port 56371:
{noformat}
"ClientPool 1" daemon prio=10 tid=0x00007fc8e4005000 nid=0x4aad in Object.wait() [0x00007fc93de6a000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(TabletServer.java:1349)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.accumulo.trace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:63)
        at $Proxy11.startMultiScan(Unknown Source)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:2252)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(TabletClientService.java:1)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:159)
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:478)
        at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:214)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

The thing is, there are no locks in this code, and the thread is Runnable.

There are many stacks like the one above, which is not surprising since the client connection
is probably timing out.

A few stacks are stuck here:

{noformat}
"ClientPool 2" daemon prio=10 tid=0x00007fc8e4005800 nid=0x4ab5 in Object.wait() [0x00007fc93dd69000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.accumulo.server.security.AuditedSecurityOperation.canScan(AuditedSecurityOperation.java:147)
        at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startScan(TabletServer.java:1181)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.accumulo.trace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:63)
        at $Proxy11.startScan(Unknown Source)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:2177)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:1)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:159)
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:478)
        at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:214)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
{noformat}

And one here:
{noformat}
"ClientPool 9" daemon prio=10 tid=0x00007fc8e400e800 nid=0x4ad0 in Object.wait() [0x00007fc93d561000]
   java.lang.Thread.State: RUNNABLE
        at org.apache.accumulo.core.client.impl.Translator.<clinit>(Translator.java:113)
        at org.apache.accumulo.server.security.AuditedSecurityOperation.canScan(AuditedSecurityOperation.java:147)
        at org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startScan(TabletServer.java:1181)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.accumulo.trace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:63)
        at $Proxy11.startScan(Unknown Source)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:2177)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:1)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:159)
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:478)
        at org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:214)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
{noformat}



> MetadataSplitIT test failed
> ---------------------------
>
>                 Key: ACCUMULO-1861
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1861
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>
> 1.6.0-SNAPSHOT, 61a4298c60c00bc9ae1db4ef02b5dca13f2f3c5b
> Running "mvn verify" ... MetadataSplitIT split failed.  Analysis of the logs show that
the master assigned the Root Table, but did not read the Root Table and assign the !METADATA
table.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message