cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From laxmikanth sadula <laxmikanth...@gmail.com>
Subject Re: New node block in autobootstrap
Date Tue, 27 Sep 2016 18:29:43 GMT
Hi paul,

Thanks for the reply...

I'm getting following streaming exceptions during nodetool rebuild in
c*-2.0.17

*04:24:49,759 StreamSession.java (line 461) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
*java.io.IOException: Connection timed out*
*    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
*    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
*    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
*    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
*    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
*    at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
*    at java.lang.Thread.run(Thread.java:745)*
*DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
ConnectionHandler.java (line 104) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
/xxx.xxx.98.168*
* INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
StreamResultFuture.java (line 186) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
complete*
*ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
StreamSession.java (line 461) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
*java.io.IOException: Broken pipe*
*    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
*    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
*    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
*    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
*    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
*    at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
*    at java.lang.Thread.run(Thread.java:745)*
*DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
ConnectionHandler.java (line 244) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
4736, transfer size: 2306880, compressed?: true), file:
/home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
*ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
StreamSession.java (line 461) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
*java.lang.RuntimeException: Outgoing stream handler has been closed*
*    at
org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
*    at
org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
*    at
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
*    at java.lang.Thread.run(Thread.java:745)*

On Sep 27, 2016 11:48 PM, "Paulo Motta" <pauloricardomg@gmail.com> wrote:

> What type of streaming timeout are you getting? Do you have a stack trace?
> What version are you in?
>
> See more information about tuning tcp_keepalive* here:
> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/
> trblshootIdleFirewall.html
>
> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth524@gmail.com>:
>
>> @Paulo Motta
>>
>> Even we are facing Streaming timeout exceptions during 'nodetool rebuild'
>> , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as suggested
>> in datastax blog  - https://support.datastax.com/h
>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>> treaming-errors-or-failures  , but still we are getting streaming
>> exceptions.
>>
>> And what is the suggestible settings/values for kernel tcp_keepalive
>> which would help streaming succeed ?
>>
>> Thank you
>>
>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pauloricardomg@gmail.com>
>> wrote:
>>
>>> What version are you in? This seems like a typical case were there was a
>>> problem with streaming (hanging, etc), do you have access to the logs?
>>> Maybe look for streaming errors? Typically streaming errors are related to
>>> timeouts, so you should review your cassandra
>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>
>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>> bootstrap resume. There were also some streaming hanging problems fixed
>>> recently, so I'd advise you to upgrade to the latest version of your
>>> particular series for a more robust version.
>>>
>>> Is there any reason why you didn't use the replace procedure
>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>> a bit faster than remove + bootstrap procedure.
>>>
>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jerome@mainaud.com>:
>>>
>>>> Hello,
>>>>
>>>> A client of mime have problems when adding a node in the cluster.
>>>> After 4 days, the node is still in joining mode, it doesn't have the
>>>> same level of load than the other and there seems to be no streaming from
>>>> and to the new node.
>>>>
>>>> This node has a history.
>>>>
>>>>    1. At the begin, it was in a seed in the cluster.
>>>>    2. Ops detected that client had problems with it.
>>>>    3. They tried to reset it but failed. In their process they
>>>>    launched several repair and rebuild process on the node.
>>>>    4. Then they asked me to help them.
>>>>    5. We stopped the node,
>>>>    6. removed it from the list of seeds (more precisely it was
>>>>    replaced by another node),
>>>>    7. removed it from the cluster (I choose not to use decommission
>>>>    since node data was compromised)
>>>>    8. deleted all files from data, commitlog and savedcache
>>>>    directories.
>>>>    9. after the leaving process ended, it was started as a fresh new
>>>>    node and began autobootstrap.
>>>>
>>>>
>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>> information, but I will have tomorrow (logs and results of some commands).
>>>> And I can ask for people any required information.
>>>>
>>>> Does someone have any idea of what could have happened and what I
>>>> should investigate first ?
>>>> What would you do to unlock the situation ?
>>>>
>>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>>
>>>> Thank you for your help.
>>>> Cheers,
>>>>
>>>>
>>>> --
>>>> Jérôme Mainaud
>>>> jerome@mainaud.com
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Laxmikanth
>> 99621 38051
>>
>>
>

Mime
View raw message