cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Sigletos <sigle...@textkernel.nl>
Subject Re: Error while rebuilding a node: Stream failed
Date Thu, 26 May 2016 11:20:05 GMT
I tried again with setting streaming_socket_timeout_in_ms to 1 day on all
nodes and after having upgraded to 2.1.14.

My tcp_keep_alive_time is set to 2 hours and tcp_keepalive_probes to 9.
That should be ok I would believe.

I get streaming error again, shortly after starting the rebuild process.
This is from the destination node:

ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
Streaming error occurred
java.lang.RuntimeException: Outgoing stream handler has been closed
        at
org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
~[apache-cassandra-2.1.14.jar:2.1.14]
        at
org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
~[apache-cassandra-2.1.14.jar:2.1.14]
        at
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
~[apache-cassandra-2.1.14.jar:2.1.14]
        at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
~[apache-cassandra-2.1.14.jar:2.1.14]
        at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]

And this is from the source node:

ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097
StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
Streaming error occurred
java.io.IOException: Broken pipe
        at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
~[na:1.7.0_79]
        at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
~[na:1.7.0_79]
        at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
~[na:1.7.0_79]
        at
org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84)
~[apache-cassandra-2.1.14.jar:2.1.14]
        at
org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88)
~[apache-cassandra-2.1.14.jar:2.1.14]
        at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49)
~[apache-cassandra-2.1.14.jar:2.1.14]
        at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
~[apache-cassandra-2.1.14.jar:2.1.14]
        at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
~[apache-cassandra-2.1.14.jar:2.1.14]
        at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358)
[apache-cassandra-2.1.14.jar:2.1.14]
        at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:330)
[apache-cassandra-2.1.14.jar:2.1.14]
        at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
INFO  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,111
StreamResultFuture.java:180 - [Stream
#74c57bc0-231a-11e6-a698-1b05ac77baf9] Session with /172.31.22.104 is
complete
WARN  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,114
StreamResultFuture.java:207 - [Stream
#74c57bc0-231a-11e6-a698-1b05ac77baf9] Stream failed


Streaming does not seem to be resumed again from this node. Shall I just
kill again the entire rebuild process?

On Thu, May 26, 2016 at 12:17 AM, Paulo Motta <pauloricardomg@gmail.com>
wrote:

> If increasing or disabling streaming_socket_timeout_in_ms on the source
> node does not fix it, you may want to have a look on your tcp keep alive
> settings on the source and destination nodes as intermediate
> routers/firewalls may be killing the connections due to inactivity. See
> this for more information:
> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
>
> This will ultimately fixed by CASSANDRA-11841 by adding keep-alive to the
> streaming protocol.
>
> 2016-05-25 18:09 GMT-03:00 George Sigletos <sigletos@textkernel.nl>:
>
>> Thanks a lot for your help. I will try that tomorrow. The first time that
>> I tried to rebuild, streaming_socket_timeout_in_ms was 0 and still failed.
>> Below is the directly previous error on the source node:
>>
>> ERROR [STREAM-IN-/172.31.22.104] 2016-05-24 22:32:20,437
>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>> Streaming error occurred
>> java.io.IOException: Connection timed out
>>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> ~[na:1.7.0_79]
>>         at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79]
>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>> ~[na:1.7.0_79]
>>         at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
>>         at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>> ~[na:1.7.0_79]
>>         at
>> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>         at
>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:250)
>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>
>> On Wed, May 25, 2016 at 10:28 PM, Paulo Motta <pauloricardomg@gmail.com>
>> wrote:
>>
>>> > Workaround is to set to a larger streaming_socket_timeout_in_ms **on
>>> the source node**., the new default will be 86400000ms (1 day).
>>>
>>> 2016-05-25 17:23 GMT-03:00 Paulo Motta <pauloricardomg@gmail.com>:
>>>
>>>> Was there any other ERROR preceding this on this node (in particular
>>>> the last few lines of [STREAM-IN-/172.31.22.104])? If it's a
>>>> SocketTimeoutException, then what is happening is that the default
>>>> streaming socket timeout of 1 hour is not sufficient to stream a single
>>>> file and the stream session is failed. Workaround is to set to a larger
>>>> streaming_socket_timeout_in_ms, the new default will be 86400000ms (1
>>>> day).
>>>>
>>>> We are addressing this on
>>>> https://issues.apache.org/jira/browse/CASSANDRA-11839.
>>>>
>>>> 2016-05-25 16:42 GMT-03:00 George Sigletos <sigletos@textkernel.nl>:
>>>>
>>>>> Hello again,
>>>>>
>>>>> Here is the error message from the source
>>>>>
>>>>> INFO  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,275
>>>>> StreamResultFuture.java:180 - [Stream
>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /172.31.22.104 is
>>>>> complete
>>>>> WARN  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,276
>>>>> StreamResultFuture.java:207 - [Stream
>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
>>>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-25 00:44:57,353
>>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>> Streaming error occurred
>>>>> java.lang.AssertionError: Memory was freed
>>>>>         at
>>>>> org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at
>>>>> org.apache.cassandra.io.util.Memory.getLong(Memory.java:249)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at
>>>>> org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at
>>>>> org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at
>>>>> org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at
>>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at
>>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at
>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at
>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at
>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>>
>>>>> On Wed, May 25, 2016 at 8:49 PM, Paulo Motta <pauloricardomg@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> This is the log of the destination/rebuilding node, you need to check
>>>>>> what is the error message on the stream source node (192.168.1.140).
>>>>>>
>>>>>>
>>>>>> 2016-05-25 15:22 GMT-03:00 George Sigletos <sigletos@textkernel.nl>:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Here is additional stack trace from system.log:
>>>>>>>
>>>>>>> ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704
>>>>>>> StreamSession.java:620 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>> Remote peer 192.168.1.140 failed stream session.
>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705
>>>>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>> Streaming error occurred
>>>>>>> java.io.IOException: Connection timed out
>>>>>>>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>>>>>>> ~[na:1.7.0_79]
>>>>>>>         at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>>>>>>> ~[na:1.7.0_79]
>>>>>>>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>>>>>>> ~[na:1.7.0_79]
>>>>>>>         at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79]
>>>>>>>         at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>>>>>>> ~[na:1.7.0_79]
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:323)
>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>>>> INFO  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,625
>>>>>>> StreamResultFuture.java:180 - [Stream
>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /192.168.1.140
>>>>>>> is complete
>>>>>>> WARN  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,627
>>>>>>> StreamResultFuture.java:207 - [Stream
>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
>>>>>>> ERROR [RMI TCP Connection(24)-127.0.0.1] 2016-05-24 22:44:58,628
>>>>>>> StorageService.java:1075 - Error while rebuilding node
>>>>>>> org.apache.cassandra.streaming.StreamException: Stream failed
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>         at
>>>>>>> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>         at
>>>>>>> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>         at
>>>>>>> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>         at
>>>>>>> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:621)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:475)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_79]
>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:58,629
>>>>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>> Streaming error occurred
>>>>>>> java.io.IOException: Broken pipe
>>>>>>>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>>>>>>> ~[na:1.7.0_79]
>>>>>>>         at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>>>>>>> ~[na:1.7.0_79]
>>>>>>>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>>>>>>> ~[na:1.7.0_79]
>>>>>>>         at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79]
>>>>>>>         at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>>>>>>> ~[na:1.7.0_79]
>>>>>>>         at
>>>>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at
>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 25, 2016 at 5:23 PM, Paulo Motta <
>>>>>>> pauloricardomg@gmail.com> wrote:
>>>>>>>
>>>>>>>> The stack trace from the rebuild command not show the root
cause of
>>>>>>>> the rebuild stream error. Can you check the system.log for
ERROR logs
>>>>>>>> during streaming and paste here?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message