cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Motta <pauloricard...@gmail.com>
Subject Re: New node block in autobootstrap
Date Tue, 27 Sep 2016 18:18:22 GMT
What type of streaming timeout are you getting? Do you have a stack trace?
What version are you in?

See more information about tuning tcp_keepalive* here:
https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html

2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth524@gmail.com>:

> @Paulo Motta
>
> Even we are facing Streaming timeout exceptions during 'nodetool rebuild'
> , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as suggested
> in datastax blog  - https://support.datastax.com/h
> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-
> streaming-errors-or-failures  , but still we are getting streaming
> exceptions.
>
> And what is the suggestible settings/values for kernel tcp_keepalive which
> would help streaming succeed ?
>
> Thank you
>
> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pauloricardomg@gmail.com>
> wrote:
>
>> What version are you in? This seems like a typical case were there was a
>> problem with streaming (hanging, etc), do you have access to the logs?
>> Maybe look for streaming errors? Typically streaming errors are related to
>> timeouts, so you should review your cassandra
>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>
>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>> bootstrap resume. There were also some streaming hanging problems fixed
>> recently, so I'd advise you to upgrade to the latest version of your
>> particular series for a more robust version.
>>
>> Is there any reason why you didn't use the replace procedure
>> (-Dreplace_address) to replace the node with the same tokens? This would be
>> a bit faster than remove + bootstrap procedure.
>>
>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jerome@mainaud.com>:
>>
>>> Hello,
>>>
>>> A client of mime have problems when adding a node in the cluster.
>>> After 4 days, the node is still in joining mode, it doesn't have the
>>> same level of load than the other and there seems to be no streaming from
>>> and to the new node.
>>>
>>> This node has a history.
>>>
>>>    1. At the begin, it was in a seed in the cluster.
>>>    2. Ops detected that client had problems with it.
>>>    3. They tried to reset it but failed. In their process they launched
>>>    several repair and rebuild process on the node.
>>>    4. Then they asked me to help them.
>>>    5. We stopped the node,
>>>    6. removed it from the list of seeds (more precisely it was replaced
>>>    by another node),
>>>    7. removed it from the cluster (I choose not to use decommission
>>>    since node data was compromised)
>>>    8. deleted all files from data, commitlog and savedcache
>>>    directories.
>>>    9. after the leaving process ended, it was started as a fresh new
>>>    node and began autobootstrap.
>>>
>>>
>>> As I don’t have direct access to the cluster I don't have a lot of
>>> information, but I will have tomorrow (logs and results of some commands).
>>> And I can ask for people any required information.
>>>
>>> Does someone have any idea of what could have happened and what I should
>>> investigate first ?
>>> What would you do to unlock the situation ?
>>>
>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>
>>> Thank you for your help.
>>> Cheers,
>>>
>>>
>>> --
>>> Jérôme Mainaud
>>> jerome@mainaud.com
>>>
>>
>>
>
>
> --
> Regards,
> Laxmikanth
> 99621 38051
>
>

Mime
View raw message