cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "techpyaasa ." <techpya...@gmail.com>
Subject Re: nodetool rebuild streaming exception
Date Wed, 28 Sep 2016 13:18:39 GMT
@Alain
That was one of my teammate , very sorry for it/multiple threads.

*It looks like streams are failing right away when trying to rebuild.?*
No , after partial streaming of data (around 150 GB - we have around 600 GB
of data on each node) streaming is getting failed with the above exception
stack trace.

*It should be ran from DC3 servers, after altering keyspace to add
keyspaces to the new datacenter. Is this the way you're doing it?*
Yes, I'm running it from DC3 using " nodetool rebuild 'DC1' " command  ,
after altering keyspace with RF : DC1:3 , DC2:3 , DC3:3 and we using Network
Topology Strategy.

Yes , all nodes are running on same c*-2.0.17 version.

As I said , 'streaming_socket_timeout_in_ms: 86400000' to 24 hours.

As suggested in @Paul & in some blogs , we gonna re-try with following
changes *on new nodes in DC3.*




*net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10*
Hope these settings are enough on new nodes from where we are going to
initiate rebuild/streaming and NOT required on all existing nodes from
where we are getting data streamed. Am I right ??

Have to see whether it works :( and btw ,you can please through a light on
this if you have faced such exception in past.

As I mentioned in my last mail, this is the exception we are getting in
streaming AFTER STREAMING some data.

*java.io.IOException: Connection timed out*
*        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
*        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
*        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
*        at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
*        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
*        at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
*        at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
*        at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
*        at java.lang.Thread.run(Thread.java:745)*
* INFO [STREAM-OUT-/xxx.xxx.198.191] 2016-09-27 00:28:10,347
StreamResultFuture.java (line 186) [Stream
#30852870-8472-11e6-b043-3f260c696828] Session with /xxx.xxx.198.191 is
complete*
*ERROR [STREAM-OUT-/xxx.xxx.198.191] 2016-09-27 00:28:10,347
StreamSession.java (line 461) [Stream
#30852870-8472-11e6-b043-3f260c696828] Streaming error occurred*
*java.io.IOException: Broken pipe*
*        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
*        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
*        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
*        at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
*        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
*        at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
*        at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
*        at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
*        at java.lang.Thread.run(Thread.java:745)*
*ERROR [STREAM-IN-/xxx.xxx.198.191] 2016-09-27 00:28:10,461
StreamSession.java (line 461) [Stream
#30852870-8472-11e6-b043-3f260c696828] Streaming error occurred*
*java.lang.RuntimeException: Outgoing stream handler has been closed*
*        at
org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
*        at
org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
*        at
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
*        at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
*        at java.lang.Thread.run(Thread.java:745)*

Thanks in advance
techpyaasa

On Wed, Sep 28, 2016 at 6:09 PM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:

> Just saw a very similar question from Laxmikanth (laxmikanth524@gmail.com)
> on an other thread, with the same logs.
>
> Would you mind to avoid splitting multiple threads, to gather up
> informations so we can better help you from this mailing list?
>
> C*heers,
>
>
> 2016-09-28 14:28 GMT+02:00 Alain RODRIGUEZ <arodrime@gmail.com>:
>
>> Hi,
>>
>> It looks like streams are failing right away when trying to rebuild.
>>
>>
>>    - Could you please share with us the command you used?
>>
>>
>> It should be ran from DC3 servers, after altering keyspace to add
>> keyspaces to the new datacenter. Is this the way you're doing it?
>>
>>    - Are all the nodes using the same version ('nodetool version')?
>>    - What does 'nodetool status keyspace_name1' output?
>>    - Are you sure to be using Network Topology Strategy on '*keyspace_name1'?
>>    *Have you modified this schema to add replications on DC3
>>
>> My guess is something could be wrong with the configuration.
>>
>> I checked with our network operations team , they have confirmed network
>>> is stable and no network hiccups.
>>> I have set 'streaming_socket_timeout_in_ms: 86400000' (24 hours) as
>>> suggested in datastax blog  - https://support.datastax.com
>>> /hc/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of
>>> -streaming-errors-or-failures and ran 'nodetool rebuild' one node at a
>>> time but was of NO USE . Still we are getting above exception.
>>>
>>
>> This look correct to me, good you added this information, thanks.
>>
>> An other thought is I believe you need all the nodes to be up to have
>> those streams working on the origin DC you use for your 'nodetool rebuild
>> <origin_dc>' command.
>>
>> This look a bit weird, good luck.
>>
>> C*heers,
>> -----------------------
>> Alain Rodriguez - @arodream - alain@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>> 2016-09-27 18:54 GMT+02:00 techpyaasa . <techpyaasa@gmail.com>:
>>
>>> Hi,
>>>
>>> I'm trying to add new data center - DC3 to existing c*-2.0.17 cluster
>>> with 2 data centers DC1, DC2 with replication DC1:3 , DC2:3 , DC3:3.
>>>
>>>  I'm getting following exception repeatedly on new nodes after I run
>>> 'nodetool rebuild'.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *DEBUG [ScheduledTasks:1] 2016-09-27 04:24:00,416 GCInspector.java (line
>>> 118) GC for ParNew: 20 ms for 1 collections, 9837479688 used; max is
>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:03,417
>>> GCInspector.java (line 118) GC for ParNew: 20 ms for 1 collections,
>>> 9871193904 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
>>> 04:24:06,418 GCInspector.java (line 118) GC for ParNew: 20 ms for 1
>>> collections, 9950298136 used; max is 16760438784DEBUG [ScheduledTasks:1]
>>> 2016-09-27 04:24:09,419 GCInspector.java (line 118) GC for ParNew: 19 ms
>>> for 1 collections, 9941119568 used; max is 16760438784DEBUG
>>> [ScheduledTasks:1] 2016-09-27 04:24:12,421 GCInspector.java (line 118) GC
>>> for ParNew: 20 ms for 1 collections, 9864185024 used; max is
>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:15,422
>>> GCInspector.java (line 118) GC for ParNew: 60 ms for 2 collections,
>>> 9730374352 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
>>> 04:24:18,423 GCInspector.java (line 118) GC for ParNew: 18 ms for 1
>>> collections, 9775448168 used; max is 16760438784DEBUG [ScheduledTasks:1]
>>> 2016-09-27 04:24:21,424 GCInspector.java (line 118) GC for ParNew: 22 ms
>>> for 1 collections, 9850794272 used; max is 16760438784DEBUG
>>> [ScheduledTasks:1] 2016-09-27 04:24:24,425 GCInspector.java (line 118) GC
>>> for ParNew: 20 ms for 1 collections, 9729992448 <9729992448> used; max
is
>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:27,426
>>> GCInspector.java (line 118) GC for ParNew: 22 ms for 1 collections,
>>> 9699783920 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
>>> 04:24:30,427 GCInspector.java (line 118) GC for ParNew: 21 ms for 1
>>> collections, 9696523920 used; max is 16760438784DEBUG [ScheduledTasks:1]
>>> 2016-09-27 04:24:33,429 GCInspector.java (line 118) GC for ParNew: 20 ms
>>> for 1 collections, 9560497904 used; max is 16760438784DEBUG
>>> [ScheduledTasks:1] 2016-09-27 04:24:36,430 GCInspector.java (line 118) GC
>>> for ParNew: 19 ms for 1 collections, 9568718352 <9568718352> used; max
is
>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:39,431
>>> GCInspector.java (line 118) GC for ParNew: 22 ms for 1 collections,
>>> 9496991384 <9496991384> used; max is 16760438784DEBUG [ScheduledTasks:1]
>>> 2016-09-27 04:24:42,432 GCInspector.java (line 118) GC for ParNew: 19 ms
>>> for 1 collections, 9486433840 used; max is 16760438784DEBUG
>>> [ScheduledTasks:1] 2016-09-27 04:24:45,434 GCInspector.java (line 118) GC
>>> for ParNew: 19 ms for 1 collections, 9442642688 used; max is
>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:48,435
>>> GCInspector.java (line 118) GC for ParNew: 20 ms for 1 collections,
>>> 9548532008 <9548532008> used; max is 16760438784DEBUG
>>> [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,756 ConnectionHandler.java
>>> (line 244) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File
>>> (Header (cfId: bf446a90-71c5-3552-a2e5-b1b94dbf86e3, #0, version: jb,
>>> estimated keys: 252928, transfer size: 5496759656, compressed?: true),
>>> file:
>>> /home/cassandra/data_directories/data/keyspace_name1/columnfamily_1/keyspace_name1-columnfamily_1-tmp-jb-54-Data.db)DEBUG
>>> [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,757 ConnectionHandler.java
>>> (line 310) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Sending Received
>>> (bf446a90-71c5-3552-a2e5-b1b94dbf86e3, #0)ERROR
>>> [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,759 StreamSession.java
>>> (line 461) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error
>>> occurredjava.io.IOException: Connection timed out    at
>>> sun.nio.ch.FileDispatcherImpl.write0(Native Method)    at
>>> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)    at
>>> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)    at
>>> sun.nio.ch.IOUtil.write(IOUtil.java:65)    at
>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)    at
>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>> at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>>> at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)
>>> at java.lang.Thread.run(Thread.java:745)DEBUG [STREAM-OUT-/xxx.xxx.98.168]
>>> 2016-09-27 04:24:49,764 ConnectionHandler.java (line 104) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
>>> /xxx.xxx.98.168 INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>> StreamResultFuture.java (line 186) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>>> completeERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>> StreamSession.java (line 461) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error
>>> occurredjava.io.IOException: Broken pipe    at
>>> sun.nio.ch.FileDispatcherImpl.write0(Native Method)    at
>>> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)    at
>>> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)    at
>>> sun.nio.ch.IOUtil.write(IOUtil.java:65)    at
>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)    at
>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>> at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>>> at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
>>> at java.lang.Thread.run(Thread.java:745)DEBUG [STREAM-IN-/xxx.xxx.98.168]
>>> 2016-09-27 04:24:49,909 ConnectionHandler.java (line 244) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>>> 4736, transfer size: 2306880, compressed?: true), file:
>>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)ERROR
>>> [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 StreamSession.java
>>> (line 461) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error
>>> occurredjava.lang.RuntimeException: Outgoing stream handler has been
>>> closed    at
>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)
>>> at
>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)
>>> at
>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)
>>> at
>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)
>>> at java.lang.Thread.run(Thread.java:745)*
>>>
>>>
>>> I checked with our network operations team , they have confirmed network
>>> is stable and no network hiccups.
>>> I have set 'streaming_socket_timeout_in_ms: 86400000' (24 hours) as
>>> suggested in datastax blog  - https://support.datastax.com/h
>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>>> treaming-errors-or-failures and ran 'nodetool rebuild' one node at a
>>> time but was of NO USE . Still we are getting above exception.
>>>
>>> Can someone please help me in debugging and fixing this.
>>>
>>>
>>> Thanks,
>>> techpyaasa
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message