accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1740) intermittent integration test failure
Date Fri, 27 Sep 2013 17:43:03 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780138#comment-13780138
] 

ASF subversion and git services commented on ACCUMULO-1740:
-----------------------------------------------------------

Commit e5e5b2bc39be0c500e18880d7e1821f2903bfb2d in branch refs/heads/master from [~ecn]
[ https://git-wip-us.apache.org/repos/asf?p=accumulo.git;h=e5e5b2b ]

ACCUMULO-1740 no idea why this fixes the hangs we've seen

                
> intermittent integration test failure
> -------------------------------------
>
>                 Key: ACCUMULO-1740
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1740
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>
> Some of the recovery integration tests fail with a very long timeout (10 minutes).
> After a restart of the tablet servers, the WAL is sorted, and the root tablet is assigned.
 After that, the master does not assign the !METADATA tablets.
> I've managed to jstack the master, and it seems to be stuck scanning.  I turned on DEBUG
log messages and I see this:
> {noformat}
> 2013-09-25 17:27:46,340 [impl.TabletServerBatchReaderIterator] DEBUG: Server : rd6ul-14706v.tycho.ncsc.mil:37957
msg : java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to
be ready for
>  read. ch : java.nio.channels.SocketChannel[connected local=/10.0.0.1:33362 remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
> 2013-09-25 17:27:46,340 [impl.TabletServerBatchReaderIterator] DEBUG: org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready
for read. ch : java.nio.channels.SocketChannel[connected local=/10.0.0.1:33362 remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
> java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/10.0.0.1:33362 remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:705)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator$QueryTask.run(TabletServerBatchReaderIterator.java:364)
>         at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException:
120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/10.0.0.1:33362 remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
>         at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>         at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>         at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>         at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>         at org.apache.accumulo.core.client.impl.ThriftTransportPool$CachedTTransport.readAll(ThriftTransportPool.java:254)
>         at org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
>         at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
>         at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startMultiScan(TabletClientService.java:310)
>         at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startMultiScan(TabletClientService.java:290)
>         at org.apache.accumulo.core.client.impl.TabletServerBatchReaderIterator.doLookup(TabletServerBatchReaderIterator.java:650)
>         ... 7 more
> Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel
to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.0.0.1:33362
remote=rd6ul-14706v.tycho.ncsc.mil/10.0.0.1:37957]
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>         at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>         ... 18 more
> {noformat}
> The tablet server does put the root tablet online.
> There are 8 tests that restart tablet servers, this usually only happens to one of the
tests per run, making it difficult to track down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message