accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-2408) metadata table not assigned after root table is loaded
Date Wed, 26 Feb 2014 14:47:20 GMT
Eric Newton created ACCUMULO-2408:
-------------------------------------

             Summary: metadata table not assigned after root table is loaded
                 Key: ACCUMULO-2408
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2408
             Project: Accumulo
          Issue Type: Bug
          Components: master
            Reporter: Eric Newton
            Assignee: Eric Newton
            Priority: Critical
             Fix For: 1.6.0


During a nightly integration test run, BigRootTableIT failed, timing out after 4 minutes:

{noformat}
java.lang.Exception: test timed out after 240000 milliseconds
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
	at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
	at org.apache.accumulo.core.client.admin.TableOperationsImpl.addSplits(TableOperationsImpl.java:437)
	at org.apache.accumulo.test.functional.BigRootTabletIT.test(BigRootTabletIT.java:50)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
{noformat}

Looking at the logs, the root tablet is assigned successfully:

{noformat}
2014-02-26 05:17:09,414 [state.ZooTabletStateStore] DEBUG: Returning root tablet state: +r<<@(tserver1:9997[1446db2884a0002],null,null)
2014-02-26 05:17:09,596 [master.EventCoordinator] INFO : tablet +r<< was loaded on tserver1:9997
{noformat}

No other tablets are assigned for the next four minutes.

The logs are full of "Failed to bin" errors:

{noformat}
2014-02-26 05:19:09,613 [impl.ThriftTransportPool] TRACE: Using existing connection to tserver1:9997
2014-02-26 05:19:09,615 [impl.ThriftTransportPool] TRACE: Returned connection tserver1:9997
(120000) ioCount : 562
2014-02-26 05:19:09,615 [metadata.MetadataLocationObtainer] TRACE: tid=28 oid=3448  Got 2
results  from +r<< in 0.002 secs
2014-02-26 05:19:09,615 [impl.TabletLocatorImpl] TRACE: tid=28 oid=3446  Binned 1 ranges for
table !0 to 0 tservers in 0.003 secs
2014-02-26 05:19:09,616 [impl.TabletServerBatchReaderIterator] TRACE: Failed to bin 1 ranges,
tablet locations were null, retrying in 100ms
{noformat}

There is an IOException, trying to do a batch read

{noformat}
2014-02-26 05:19:09,687 [impl.TabletServerBatchReaderIterator] DEBUG: Server : tserver1:9997
msg : java.net.SocketTimeoutException: 120000 millis timeout while
 waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
2014-02-26 05:19:09,689 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: 120000 millis timeout while waiting
 for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818
remote=tserver1/192.168.1.1:9997]
java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.
channels.SocketChannel[connected local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:713)
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:372)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/192.168.1.2:52818 remote=tserver1/192.168.1.1:9997]
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:270)
        at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
        at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:311)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:291)
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:658)
        ... 7 more
Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.2:52818
remote=tserver1/192.168.1.1:9997]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 18 more
2014-02-26 05:19:09,693 [impl.TabletServerBatchReaderIterator] TRACE: Failed to execute multiscans
against 1 tablets, retrying...
{noformat}

This would appear to be the batch scanner used to read the root table in the master.

The tablet server hosting the root tablet is being successfully scanned more that 24x a second,
presumably from clients.

There are no errors in the tserver logs.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message