cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Motta <pauloricard...@gmail.com>
Subject Re: Error while rebuilding a node: Stream failed
Date Thu, 26 May 2016 15:20:30 GMT
How long does it take after you trigger the rebuild process before it fails?

Was there any error before [STREAM-IN-/192.168.1.141] on the destination
node or [STREAM-OUT-/172.31.22.104] on the source node? Those are showing
consequences of the root error. In particular what were the last messages
on [STREAM-OUT-/192.168.1.141] and [STREAM-IN-/172.31.22.104] ?

> Streaming does not seem to be resumed again from this node. Shall I just
kill again the entire rebuild process?

Yes, resumable rebuild will be supported on CASSANDRA-10810.

2016-05-26 8:20 GMT-03:00 George Sigletos <sigletos@textkernel.nl>:

> I tried again with setting streaming_socket_timeout_in_ms to 1 day on all
> nodes and after having upgraded to 2.1.14.
>
> My tcp_keep_alive_time is set to 2 hours and tcp_keepalive_probes to 9.
> That should be ok I would believe.
>
> I get streaming error again, shortly after starting the rebuild process.
> This is from the destination node:
>
> ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
> StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
> Streaming error occurred
> java.lang.RuntimeException: Outgoing stream handler has been closed
>         at
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>
> And this is from the source node:
>
> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097
> StreamSession.java:505 - [Stream #74c57bc0-231a-11e6-a698-1b05ac77baf9]
> Streaming error occurred
> java.io.IOException: Broken pipe
>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> ~[na:1.7.0_79]
>         at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
> ~[na:1.7.0_79]
>         at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
> ~[na:1.7.0_79]
>         at
> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358)
> [apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:330)
> [apache-cassandra-2.1.14.jar:2.1.14]
>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
> INFO  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,111
> StreamResultFuture.java:180 - [Stream
> #74c57bc0-231a-11e6-a698-1b05ac77baf9] Session with /172.31.22.104 is
> complete
> WARN  [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,114
> StreamResultFuture.java:207 - [Stream
> #74c57bc0-231a-11e6-a698-1b05ac77baf9] Stream failed
>
> > Streaming does not seem to be resumed again from this node. Shall I just
> kill again the entire rebuild process?
>

> On Thu, May 26, 2016 at 12:17 AM, Paulo Motta <pauloricardomg@gmail.com>
> wrote:
>
>> If increasing or disabling streaming_socket_timeout_in_ms on the source
>> node does not fix it, you may want to have a look on your tcp keep alive
>> settings on the source and destination nodes as intermediate
>> routers/firewalls may be killing the connections due to inactivity. See
>> this for more information:
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
>>
>> This will ultimately fixed by CASSANDRA-11841 by adding keep-alive to the
>> streaming protocol.
>>
>> 2016-05-25 18:09 GMT-03:00 George Sigletos <sigletos@textkernel.nl>:
>>
>>> Thanks a lot for your help. I will try that tomorrow. The first time
>>> that I tried to rebuild, streaming_socket_timeout_in_ms was 0 and still
>>> failed. Below is the directly previous error on the source node:
>>>
>>> ERROR [STREAM-IN-/172.31.22.104] 2016-05-24 22:32:20,437
>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>> Streaming error occurred
>>> java.io.IOException: Connection timed out
>>>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>> ~[na:1.7.0_79]
>>>         at sun.nio.ch.SocketDispatcher.read(Unknown Source)
>>> ~[na:1.7.0_79]
>>>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>>> ~[na:1.7.0_79]
>>>         at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
>>>         at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>>> ~[na:1.7.0_79]
>>>         at
>>> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>         at
>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:250)
>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>
>>> On Wed, May 25, 2016 at 10:28 PM, Paulo Motta <pauloricardomg@gmail.com>
>>> wrote:
>>>
>>>> > Workaround is to set to a larger streaming_socket_timeout_in_ms **on
>>>> the source node**., the new default will be 86400000ms (1 day).
>>>>
>>>> 2016-05-25 17:23 GMT-03:00 Paulo Motta <pauloricardomg@gmail.com>:
>>>>
>>>>> Was there any other ERROR preceding this on this node (in particular
>>>>> the last few lines of [STREAM-IN-/172.31.22.104])? If it's a
>>>>> SocketTimeoutException, then what is happening is that the default
>>>>> streaming socket timeout of 1 hour is not sufficient to stream a single
>>>>> file and the stream session is failed. Workaround is to set to a larger
>>>>> streaming_socket_timeout_in_ms, the new default will be 86400000ms (1
>>>>> day).
>>>>>
>>>>> We are addressing this on
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-11839.
>>>>>
>>>>> 2016-05-25 16:42 GMT-03:00 George Sigletos <sigletos@textkernel.nl>:
>>>>>
>>>>>> Hello again,
>>>>>>
>>>>>> Here is the error message from the source
>>>>>>
>>>>>> INFO  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,275
>>>>>> StreamResultFuture.java:180 - [Stream
>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /172.31.22.104
>>>>>> is complete
>>>>>> WARN  [STREAM-IN-/172.31.22.104] 2016-05-25 00:44:57,276
>>>>>> StreamResultFuture.java:207 - [Stream
>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
>>>>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-25 00:44:57,353
>>>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>> Streaming error occurred
>>>>>> java.lang.AssertionError: Memory was freed
>>>>>>         at
>>>>>> org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:97)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at
>>>>>> org.apache.cassandra.io.util.Memory.getLong(Memory.java:249)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at
>>>>>> org.apache.cassandra.io.compress.CompressionMetadata.getTotalSizeForSections(CompressionMetadata.java:247)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at
>>>>>> org.apache.cassandra.streaming.messages.FileMessageHeader.size(FileMessageHeader.java:112)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at
>>>>>> org.apache.cassandra.streaming.StreamSession.fileSent(StreamSession.java:546)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at
>>>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:50)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at
>>>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at
>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>>>
>>>>>> On Wed, May 25, 2016 at 8:49 PM, Paulo Motta <
>>>>>> pauloricardomg@gmail.com> wrote:
>>>>>>
>>>>>>> This is the log of the destination/rebuilding node, you need
to
>>>>>>> check what is the error message on the stream source node (192.168.1.140).
>>>>>>>
>>>>>>>
>>>>>>> 2016-05-25 15:22 GMT-03:00 George Sigletos <sigletos@textkernel.nl>:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Here is additional stack trace from system.log:
>>>>>>>>
>>>>>>>> ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704
>>>>>>>> StreamSession.java:620 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>>> Remote peer 192.168.1.140 failed stream session.
>>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705
>>>>>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>>> Streaming error occurred
>>>>>>>> java.io.IOException: Connection timed out
>>>>>>>>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>         at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown
Source)
>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>         at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79]
>>>>>>>>         at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:323)
>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>>>>> INFO  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,625
>>>>>>>> StreamResultFuture.java:180 - [Stream
>>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Session with /192.168.1.140
>>>>>>>> is complete
>>>>>>>> WARN  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,627
>>>>>>>> StreamResultFuture.java:207 - [Stream
>>>>>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9] Stream failed
>>>>>>>> ERROR [RMI TCP Connection(24)-127.0.0.1] 2016-05-24 22:44:58,628
>>>>>>>> StorageService.java:1075 - Error while rebuilding node
>>>>>>>> org.apache.cassandra.streaming.StreamException: Stream failed
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>         at
>>>>>>>> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>         at
>>>>>>>> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>         at
>>>>>>>> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>         at
>>>>>>>> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>>>>>>>> ~[guava-16.0.jar:na]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:621)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:475)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_79]
>>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:58,629
>>>>>>>> StreamSession.java:505 - [Stream #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>>>>>> Streaming error occurred
>>>>>>>> java.io.IOException: Broken pipe
>>>>>>>>         at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>         at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown
Source)
>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>         at sun.nio.ch.IOUtil.write(Unknown Source) ~[na:1.7.0_79]
>>>>>>>>         at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>>>>>>>> ~[na:1.7.0_79]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at
>>>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>>>>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 25, 2016 at 5:23 PM, Paulo Motta <
>>>>>>>> pauloricardomg@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> The stack trace from the rebuild command not show the
root cause
>>>>>>>>> of the rebuild stream error. Can you check the system.log
for ERROR logs
>>>>>>>>> during streaming and paste here?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message