accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-1740) intermittent integration test failure
Date Wed, 25 Sep 2013 21:51:03 GMT
Eric Newton created ACCUMULO-1740:
-------------------------------------

             Summary: intermittent integration test failure
                 Key: ACCUMULO-1740
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1740
             Project: Accumulo
          Issue Type: Bug
          Components: test
            Reporter: Eric Newton
            Assignee: Eric Newton


Some of the recovery integration tests fail with a very long timeout (10 minutes).

After a restart of the tablet servers, the WAL is sorted, and the root tablet is assigned.
 After that, the master does not assign the !METADATA tablets.

I've managed to jstack the master, and it seems to be stuck scanning.  I turned on DEBUG log
messages and I see this:
{noformat}
2013-09-25 17:27:46,340 [impl.TabletServerBatchReaderIterator] DEBUG: Server : rd6ul-14706v.tycho.ncsc.mil:37957
msg : java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to
be ready for
 read. ch : java.nio.channels.SocketChannel[connected local=/10.0.0.1:33362 remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
2013-09-25 17:27:46,340 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready
for read. ch : java.nio.channels.SocketChannel[connected local=/10.0.0.1:33362 remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/10.0.0.1:33362 remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:705)
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:364)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
        at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/10.0.0.1:33362 remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:254)
        at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
        at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:310)
        at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:290)
        at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:650)
        ... 7 more
Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.0.0.1:33362
remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 18 more
{noformat}

The tablet server does put the root tablet online.

There are 8 tests that restart tablet servers, this usually only happens to one of the tests
per run, making it difficult to track down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message