cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9132) resumable_bootstrap_test can hang
Date Wed, 08 Apr 2015 19:49:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485887#comment-14485887
] 

Tyler Hobbs commented on CASSANDRA-9132:
----------------------------------------

While looking into failures on the similar {{replace_address_test.TestReplaceAddress.resumable_replace_test}},
it looks like the problem is that after the stream breaks, a {{Retry}} message is sent, but
it fails and the failure is swallowed by {{OutboundTcpConnection}}.  Here are the relevant
debug-level logs from node4 (which is replacing node3; node1 is killed during streaming):

{noformat}
WARN  [STREAM-IN-/127.0.0.1] 2015-04-08 14:39:22,312 StreamSession.java: [Stream #f39354c0-de26-11e4-ae5c-6b09a6cc3d5a]
Retrying for following error
java.io.IOError: java.io.IOException: EOF in 52430 byte (compressed) block: could only read
12647 bytes
    at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:56) ~[main/:na]
    at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) ~[main/:na]
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
~[guava-16.0.jar:na]
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-16.0.jar:na]
    at org.apache.cassandra.io.sstable.format.big.BigTableWriter.appendFromStream(BigTableWriter.java:227)
~[main/:na]
    at org.apache.cassandra.streaming.StreamReader.writeRow(StreamReader.java:161) ~[main/:na]
    at org.apache.cassandra.streaming.StreamReader.read(StreamReader.java:104) ~[main/:na]
    at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:48)
[main/:na]
    at org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38)
[main/:na]
    at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
[main/:na]
    at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:251)
[main/:na]
    at java.lang.Thread.run(Thread.java:724) [na:1.7.0_40]
Caused by: java.io.IOException: EOF in 52430 byte (compressed) block: could only read 12647
bytes
    at com.ning.compress.lzf.LZFDecoder.readFully(LZFDecoder.java:394) ~[compress-lzf-0.8.4.jar:na]
    at com.ning.compress.lzf.LZFDecoder.decompressChunk(LZFDecoder.java:190) ~[compress-lzf-0.8.4.jar:na]
    at com.ning.compress.lzf.LZFInputStream.readyBuffer(LZFInputStream.java:254) ~[compress-lzf-0.8.4.jar:na]
    at com.ning.compress.lzf.LZFInputStream.read(LZFInputStream.java:129) ~[compress-lzf-0.8.4.jar:na]
    at java.io.DataInputStream.readFully(DataInputStream.java:195) ~[na:1.7.0_40]
    at java.io.DataInputStream.readFully(DataInputStream.java:169) ~[na:1.7.0_40]
    at org.apache.cassandra.utils.BytesReadTracker.readFully(BytesReadTracker.java:94) ~[main/:na]
    at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:360) ~[main/:na]
    at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:320) ~[main/:na]
    at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:132)
~[main/:na]
    at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:86)
~[main/:na]
    at org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) ~[main/:na]
    ... 11 common frames omitted
DEBUG [STREAM-OUT-/127.0.0.1] 2015-04-08 14:39:22,313 ConnectionHandler.java: [Stream #f39354c0-de26-11e4-ae5c-6b09a6cc3d5a]
Sending Retry (a6c9e410-de26-11e4-a645-6b09a6cc3d5a, #0)
DEBUG [WRITE-/127.0.0.1] 2015-04-08 14:39:23,315 OutboundTcpConnection.java: error writing
to /127.0.0.1
java.io.IOException: Broken pipe
    at sun.nio.ch.FileDispatcherImpl.write0(Native Method) ~[na:1.7.0_40]
    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) ~[na:1.7.0_40]
    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) ~[na:1.7.0_40]
    at sun.nio.ch.IOUtil.write(IOUtil.java:65) ~[na:1.7.0_40]
    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487) ~[na:1.7.0_40]
    at java.nio.channels.Channels.writeFullyImpl(Channels.java:78) ~[na:1.7.0_40]
    at java.nio.channels.Channels.writeFully(Channels.java:98) ~[na:1.7.0_40]
    at java.nio.channels.Channels.access$000(Channels.java:61) ~[na:1.7.0_40]
    at java.nio.channels.Channels$1.write(Channels.java:174) ~[na:1.7.0_40]
    at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
~[lz4-1.3.0.jar:na]
    at net.jpountz.lz4.LZ4BlockOutputStream.flush(LZ4BlockOutputStream.java:223) ~[lz4-1.3.0.jar:na]
    at org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66)
~[main/:na]
    at org.apache.cassandra.net.OutboundTcpConnection.writeConnected(OutboundTcpConnection.java:289)
[main/:na]
    at org.apache.cassandra.net.OutboundTcpConnection.run(OutboundTcpConnection.java:219)
[main/:na]
DEBUG [WRITE-/127.0.0.1] 2015-04-08 14:39:23,316 OutboundTcpConnection.java: attempting to
connect to /127.0.0.1
{noformat}

After that, no more retry attempts are made.

> resumable_bootstrap_test can hang
> ---------------------------------
>
>                 Key: CASSANDRA-9132
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9132
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tests
>            Reporter: Tyler Hobbs
>            Assignee: Yuki Morishita
>
> The {{bootstrap_test.TestBootstrap.resumable_bootstrap_test}} can hang sometimes.  It
looks like the following line never completes:
> {noformat}
> node3.watch_log_for("Listening for thrift clients...")
> {noformat}
> I'm not familiar enough with the recent bootstrap changes to know why that's not happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message