accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2408) metadata table not assigned after root table is loaded
Date Wed, 26 Feb 2014 16:47:20 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913130#comment-13913130
] 

Eric Newton commented on ACCUMULO-2408:
---------------------------------------

Here's the stuck stack with "jstack -m"

{noformat}
----------------- 41937 -----------------
0x000000351f00b43c      __pthread_cond_wait + 0xcc
0x00007f62d031891d      _ZN13ObjectMonitor4waitElbP6Thread + 0x9bd
0x00007f62d00ce23b      _ZN13instanceKlass15initialize_implE19instanceKlassHandleP6Thread
+ 0x36b
0x00007f62d00ce55a      _ZN13instanceKlass10initializeEP6Thread + 0x6a
0x00007f62d01055f3      _ZN18InterpreterRuntime4_newEP10JavaThreadP19constantPoolOopDesci
+ 0x153
0x00007f62cc1b0181      * org.apache.accumulo.tserver.TabletServer$ThriftClientHandler.startMultiScan(org.apache.accumulo.trace.thrift.TInfo,
org.apache.accumulo.core.security.thrift.TCredentials, java.util.Map, java.util.List, java.util.List,
java.util.Map, java.util.List, boolean) bci:221 line:1355 (Interpreted frame)
0x00007f62cc1924e7      <StubRoutines>
0x00007f62d010d465      _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread
+ 0x365
0x00007f62d010bec8      _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread
+ 0x28
0x00007f62d039e20f      _ZN10Reflection6invokeE19instanceKlassHandle12methodHandle6Handleb14objArrayHandle9BasicTypeS3_bP6Thread
+ 0x47f
0x00007f62d039efc0      _ZN10Reflection13invoke_methodEP7oopDesc6Handle14objArrayHandleP6Thread
+ 0x160
0x00007f62d0194af4      JVM_InvokeMethod + 0x224
0x00007f62cc1a4738      * sun.reflect.NativeMethodAccessorImpl.invoke0(java.lang.reflect.Method,
java.lang.Object, java.lang.Object[]) bci:0 (Interpreted frame)
0x00007f62cc198233      * sun.reflect.NativeMethodAccessorImpl.invoke(java.lang.Object, java.lang.Object[])
bci:87 line:57 (Interpreted frame)
0x00007f62cc198233      * sun.reflect.DelegatingMethodAccessorImpl.invoke(java.lang.Object,
java.lang.Object[]) bci:6 line:43 (Interpreted frame)
0x00007f62cc1988e1      * java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[])
bci:57 line:606 (Interpreted frame)
0x00007f62cc198233      * org.apache.accumulo.trace.instrument.thrift.TraceWrap$1.invoke(java.lang.Object,
java.lang.reflect.Method, java.lang.Object[]) bci:64 line:63 (Interpreted frame)
0x00007f62cc1988e1      * com.sun.proxy.$Proxy9.startMultiScan(org.apache.accumulo.trace.thrift.TInfo,
org.apache.accumulo.core.security.thrift.TCredentials, java.util.Map, java.util.List, java.util.List,
java.util.Map, java.util.List, boolean) bci:55 (Interpreted frame)
0x00007f62cc1988e1      * org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Iface,
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$startMultiScan_args) bci:42
line:2252 (Interpreted frame)
0x00007f62cc198233      * org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startMultiScan.getResult(java.lang.Object,
org.apache.thrift.TBase) bci:9 line:2236 (Interpreted frame)
0x00007f62cc198233      * org.apache.thrift.ProcessFunction.process(int, org.apache.thrift.protocol.TProtocol,
org.apache.thrift.protocol.TProtocol, java.lang.Object) bci:86 line:39 (Interpreted frame)
0x00007f62cc198058      * org.apache.thrift.TBaseProcessor.process(org.apache.thrift.protocol.TProtocol,
org.apache.thrift.protocol.TProtocol) bci:126 line:39 (Interpreted frame)
0x00007f62cc1989fe      * org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(org.apache.thrift.protocol.TProtocol,
org.apache.thrift.protocol.TProtocol) bci:37 line:171 (Interpreted frame)
0x00007f62cc1989fe      * org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke()
bci:49 line:478 (Interpreted frame)
0x00007f62cc198058      * org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run()
bci:74 line:231 (Interpreted frame)
0x00007f62cc198706      * java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
bci:95 line:1145 (Interpreted frame)
0x00007f62cc198058      * java.util.concurrent.ThreadPoolExecutor$Worker.run() bci:5 line:615
(Interpreted frame)
0x00007f62cc198706      * org.apache.accumulo.trace.instrument.TraceRunnable.run() bci:51
line:47 (Interpreted frame)
0x00007f62cc198706      * org.apache.accumulo.core.util.LoggingRunnable.run() bci:4 line:34
(Interpreted frame)
0x00007f62cc198706      * java.lang.Thread.run() bci:11 line:744 (Interpreted frame)
0x00007f62cc1924e7      <StubRoutines>
0x00007f62d010d465      _ZN9JavaCalls11call_helperEP9JavaValueP12methodHandleP17JavaCallArgumentsP6Thread
+ 0x365
0x00007f62d010bec8      _ZN9JavaCalls4callEP9JavaValue12methodHandleP17JavaCallArgumentsP6Thread
+ 0x28
0x00007f62d010c197      _ZN9JavaCalls12call_virtualEP9JavaValue11KlassHandleP6SymbolS4_P17JavaCallArgumentsP6Thread
+ 0x197
0x00007f62d010c2b7      _ZN9JavaCalls12call_virtualEP9JavaValue6Handle11KlassHandleP6SymbolS5_P6Thread
+ 0x47
0x00007f62d01881c5      _ZL12thread_entryP10JavaThreadP6Thread + 0xe5
0x00007f62d04625ff      _ZN10JavaThread17thread_main_innerEv + 0xdf
0x00007f62d0462705      _ZN10JavaThread3runEv + 0xf5
0x00007f62d032a538      _ZL10java_startP6Thread + 0x108
{noformat}

Reproduced on jdk7.0u51.


> metadata table not assigned after root table is loaded
> ------------------------------------------------------
>
>                 Key: ACCUMULO-2408
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2408
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Critical
>             Fix For: 1.6.0
>
>
> During a nightly integration test run, BigRootTableIT failed, timing out after 4 minutes:
> {noformat}
> java.lang.Exception: test timed out after 240000 milliseconds
> 	at sun.misc.Unsafe.park(Native Method)
> 	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
> 	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
> 	at org.apache.accumulo.core.client.admin.TableOperationsImpl.addSplits(TableOperationsImpl.java:437)
> 	at org.apache.accumulo.test.functional.BigRootTabletIT.test(BigRootTabletIT.java:50)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
> {noformat}
> Looking at the logs, the root tablet is assigned successfully:
> {noformat}
> 2014-02-26 05:17:09,414 [state.ZooTabletStateStore] DEBUG: Returning root tablet state:
+r<<@(tserver1:9997[1446db2884a0002],null,null)
> 2014-02-26 05:17:09,596 [master.EventCoordinator] INFO : tablet +r<< was loaded
on tserver1:9997
> {noformat}
> No other tablets are assigned for the next four minutes.
> The logs are full of "Failed to bin" errors:
> {noformat}
> 2014-02-26 05:19:09,613 [impl.ThriftTransportPool] TRACE: Using existing connection to
tserver1:9997
> 2014-02-26 05:19:09,615 [impl.ThriftTransportPool] TRACE: Returned connection tserver1:9997
(120000) ioCount : 562
> 2014-02-26 05:19:09,615 [metadata.MetadataLocationObtainer] TRACE: tid=28 oid=3448  Got
2 results  from +r<< in 0.002 secs
> 2014-02-26 05:19:09,615 [impl.TabletLocatorImpl] TRACE: tid=28 oid=3446  Binned 1 ranges
for table !0 to 0 tservers in 0.003 secs
> 2014-02-26 05:19:09,616 [impl.TabletServerBatchReaderIterator] TRACE: Failed to bin 1
ranges, tablet locations were null, retrying in 100ms
> {noformat}
> There is an IOException, trying to do a batch read
> {noformat}
> 2014-02-26 05:19:09,687 [impl.TabletServerBatchReaderIterator] DEBUG: Server : tserver1:9997
msg : java.net.SocketTimeoutException: 120000 millis timeout while
>  waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
> 2014-02-26 05:19:09,689 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: 120000 millis timeout while waiting
>  for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818
remote=tserver1/192.168.1.1:9997]
> java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.
> channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:713)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:372)
>         at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:744)
> Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
>         at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>         at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>         at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>         at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
>         at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
>         at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
>         at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:311)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:291)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:658)
>         ... 7 more
> Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818
remote=tserver1/192.168.1.1:9997]
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>         at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>         ... 18 more
> 2014-02-26 05:19:09,693 [impl.TabletServerBatchReaderIterator] TRACE: Failed to execute
multiscans against 1 tablets, retrying...
> {noformat}
> This would appear to be the batch scanner used to read the root table in the master.
> The tablet server hosting the root tablet is being successfully scanned more that 24x
a second, presumably from clients.
> There are no errors in the tserver logs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message