cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: New node block in autobootstrap
Date Wed, 28 Sep 2016 23:02:25 GMT
>
> Forgot to set replication for new data center :(


I was feeling like it could be it :-). From the other thread:


> It should be ran from DC3 servers, after altering keyspace to add
> keyspaces to the new datacenter. Is this the way you're doing it?
>
>    - Are all the nodes using the same version ('nodetool version')?
>    - What does 'nodetool status keyspace_name1' output?
>    - Are you sure to be using Network Topology Strategy on '
>    *keyspace_name1'? *Have you modified this schema to add replications
>    on DC3
>
> My guess is something could be wrong with the configuration.
>


I was starting to wonder about this one though, so thanks for letting us
about it :-).

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-09-28 23:54 GMT+02:00 techpyaasa . <techpyaasa@gmail.com>:

> Forgot to set replication for new data center :(
>
> On Wed, Sep 28, 2016 at 11:33 PM, Jonathan Haddad <jon@jonhaddad.com>
> wrote:
>
>> What was the reason?
>>
>> On Wed, Sep 28, 2016 at 9:58 AM techpyaasa . <techpyaasa@gmail.com>
>> wrote:
>>
>>> Very sorry...I got the reason for this issue..
>>> Please ignore.
>>>
>>>
>>> On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . <techpyaasa@gmail.com>
>>> wrote:
>>>
>>>> @Paulo
>>>>
>>>> We have done changes as you said
>>>> net.ipv4.tcp_keepalive_time=60
>>>> net.ipv4.tcp_keepalive_probes=3
>>>> net.ipv4.tcp_keepalive_intvl=10
>>>>
>>>> and increased streaming_socket_timeout_in_ms to 48 hours ,
>>>> "phi_convict_threshold : 9".
>>>>
>>>> And once again recommissioned new data center (DC3)  , ran " nodetool
>>>> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
>>>> got exit without any exception.
>>>>
>>>> Please check logs below
>>>>
>>>> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
>>>> StorageService.java (line 914) rebuild from dc: IDC*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
>>>> StreamResultFuture.java (line 87) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.75*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.132*
>>>> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.75*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.133*
>>>> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.132*
>>>> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.133*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.167*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.78*
>>>> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.167*
>>>> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.78*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.126*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.191*
>>>> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.126*
>>>> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.191*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.168*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.169*
>>>> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.168*
>>>> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.169*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778
>>>> StreamResultFuture.java (line 220) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed*
>>>>
>>>>
>>>> As you can see logs above , nodetool rebuild finished w/o data got
>>>> stremed and all streaming sessions completed WITHIN NOT TIME(See time stamp
>>>> in logs).
>>>>
>>>>
>>>> And also "nodetool status" seems to be all fine from this new
>>>> nodes(from which I run 'nodetool rebuild').
>>>>
>>>> Please let us know what could be the issue here.
>>>>
>>>> Thanks in advance.
>>>>
>>>> On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pauloricardomg@gmail.com>
>>>> wrote:
>>>>
>>>>> Yeah this is likely to be caused by idle connections being shut down,
>>>>> so you may need to update your tcp_keepalive* and/or network/firewall
>>>>> settings.
>>>>>
>>>>>
>>>>> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <laxmikanth524@gmail.com>
>>>>> :
>>>>>
>>>>>> Hi paul,
>>>>>>
>>>>>> Thanks for the reply...
>>>>>>
>>>>>> I'm getting following streaming exceptions during nodetool rebuild
in
>>>>>> c*-2.0.17
>>>>>>
>>>>>> *04:24:49,759 StreamSession.java (line 461) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>>>> *java.io.IOException: Connection timed out*
>>>>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>>>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>>>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>>>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>>>>> *    at
>>>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
>>>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>>>> ConnectionHandler.java (line 104) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection
handler on
>>>>>> /xxx.xxx.98.168*
>>>>>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>>>> StreamResultFuture.java (line 186) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168
is
>>>>>> complete*
>>>>>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>>>> StreamSession.java (line 461) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>>>> *java.io.IOException: Broken pipe*
>>>>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>>>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>>>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>>>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>>>>> *    at
>>>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
>>>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>>>>> ConnectionHandler.java (line 244) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>>>>>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated
keys:
>>>>>> 4736, transfer size: 2306880, compressed?: true), file:
>>>>>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
>>>>>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>>>>> StreamSession.java (line 461) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>>>> *java.lang.RuntimeException: Outgoing stream handler has been closed*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
>>>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>>>
>>>>>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pauloricardomg@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> What type of streaming timeout are you getting? Do you have a
stack
>>>>>>> trace? What version are you in?
>>>>>>>
>>>>>>> See more information about tuning tcp_keepalive* here:
>>>>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble
>>>>>>> shooting/trblshootIdleFirewall.html
>>>>>>>
>>>>>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <
>>>>>>> laxmikanth524@gmail.com>:
>>>>>>>
>>>>>>>> @Paulo Motta
>>>>>>>>
>>>>>>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>>>>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000
(24 hours) as
>>>>>>>> suggested in datastax blog  - https://support.datastax.com/h
>>>>>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-
>>>>>>>> streaming-errors-or-failures  , but still we are getting
streaming
>>>>>>>> exceptions.
>>>>>>>>
>>>>>>>> And what is the suggestible settings/values for kernel
>>>>>>>> tcp_keepalive which would help streaming succeed ?
>>>>>>>>
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <
>>>>>>>> pauloricardomg@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> What version are you in? This seems like a typical case
were there
>>>>>>>>> was a problem with streaming (hanging, etc), do you have
access to the
>>>>>>>>> logs? Maybe look for streaming errors? Typically streaming
errors are
>>>>>>>>> related to timeouts, so you should review your cassandra
>>>>>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive
settings.
>>>>>>>>>
>>>>>>>>> If you're on 2.2+ you can resume a failed bootstrap with
nodetool
>>>>>>>>> bootstrap resume. There were also some streaming hanging
problems fixed
>>>>>>>>> recently, so I'd advise you to upgrade to the latest
version of your
>>>>>>>>> particular series for a more robust version.
>>>>>>>>>
>>>>>>>>> Is there any reason why you didn't use the replace procedure
>>>>>>>>> (-Dreplace_address) to replace the node with the same
tokens? This would be
>>>>>>>>> a bit faster than remove + bootstrap procedure.
>>>>>>>>>
>>>>>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <jerome@mainaud.com>:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> A client of mime have problems when adding a node
in the cluster.
>>>>>>>>>> After 4 days, the node is still in joining mode,
it doesn't have
>>>>>>>>>> the same level of load than the other and there seems
to be no streaming
>>>>>>>>>> from and to the new node.
>>>>>>>>>>
>>>>>>>>>> This node has a history.
>>>>>>>>>>
>>>>>>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>>>>>>    2. Ops detected that client had problems with
it.
>>>>>>>>>>    3. They tried to reset it but failed. In their
process they
>>>>>>>>>>    launched several repair and rebuild process on
the node.
>>>>>>>>>>    4. Then they asked me to help them.
>>>>>>>>>>    5. We stopped the node,
>>>>>>>>>>    6. removed it from the list of seeds (more precisely
it was
>>>>>>>>>>    replaced by another node),
>>>>>>>>>>    7. removed it from the cluster (I choose not to
use
>>>>>>>>>>    decommission since node data was compromised)
>>>>>>>>>>    8. deleted all files from data, commitlog and
savedcache
>>>>>>>>>>    directories.
>>>>>>>>>>    9. after the leaving process ended, it was started
as a fresh
>>>>>>>>>>    new node and began autobootstrap.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> As I don’t have direct access to the cluster I
don't have a lot
>>>>>>>>>> of information, but I will have tomorrow (logs and
results of some
>>>>>>>>>> commands). And I can ask for people any required
information.
>>>>>>>>>>
>>>>>>>>>> Does someone have any idea of what could have happened
and what I
>>>>>>>>>> should investigate first ?
>>>>>>>>>> What would you do to unlock the situation ?
>>>>>>>>>>
>>>>>>>>>> Context: The cluster consists of two DC, each with
15 nodes.
>>>>>>>>>> Average load is around 3 TB per node. The joining
node froze a little after
>>>>>>>>>> 2 TB.
>>>>>>>>>>
>>>>>>>>>> Thank you for your help.
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jérôme Mainaud
>>>>>>>>>> jerome@mainaud.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Laxmikanth
>>>>>>>> 99621 38051
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

Mime
View raw message