cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: nodetool rebuild streaming exception
Date Wed, 28 Sep 2016 16:01:50 GMT
Hi techpyaasa,

That was one of my teammate , very sorry for it/multiple threads.


No big deal :-).

*It looks like streams are failing right away when trying to rebuild.?*
> No , after partial streaming of data (around 150 GB - we have around 600
> GB of data on each node) streaming is getting failed with the above
> exception stack trace.


Yes I get confused, I meant to say that what happens is that the specific
session that fails, fails fast, it doesn't look like a timeout issue yet
there is a '*Connection timed out'.*

I am not sure to understand what is happening here.

Could you please share what 'nodetool status keyspace_name1' outputs (if
it's big just use gist or whatever)? If not make sure all the nodes are Up
with:

$ nodetool status | grep -v UN


> *It should be ran from DC3 servers, after altering keyspace to add
> keyspaces to the new datacenter. Is this the way you're doing it?*Yes,
> I'm running it from DC3 using " nodetool rebuild 'DC1' " command  , after
> altering keyspace with RF : DC1:3 , DC2:3 , DC3:3 and we using Network
> Topology Strategy.


The command looks fine and now I know it actually worked for a while. If
you have many keyspace, some might work and at some point one of them could
fail. Keyspace 'keyspace_name1' looks like a test one. Are you sure on how
it is configured? If not feel free to paste here the keyspace configuration
as well (no need for the whole schema with tables details).

$ echo 'DESCRIBE KEYSPACE keyspace_name1;' | cqlsh <cassandra_server>

As I said , 'streaming_socket_timeout_in_ms: 86400000' to 24 hours.
>

Also have you done this on all the node and restarted them?

How long does the rebuild operation runs before failing?

I have no real idea on what's happening there, just trying to give you some
clues.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-09-28 15:18 GMT+02:00 techpyaasa . <techpyaasa@gmail.com>:

> @Alain
> That was one of my teammate , very sorry for it/multiple threads.
>
> *It looks like streams are failing right away when trying to rebuild.?*
> No , after partial streaming of data (around 150 GB - we have around 600
> GB of data on each node) streaming is getting failed with the above
> exception stack trace.
>
> *It should be ran from DC3 servers, after altering keyspace to add
> keyspaces to the new datacenter. Is this the way you're doing it?*
> Yes, I'm running it from DC3 using " nodetool rebuild 'DC1' " command  ,
> after altering keyspace with RF : DC1:3 , DC2:3 , DC3:3 and we using Network
> Topology Strategy.
>
> Yes , all nodes are running on same c*-2.0.17 version.
>
> As I said , 'streaming_socket_timeout_in_ms: 86400000' to 24 hours.
>
> As suggested in @Paul & in some blogs , we gonna re-try with following
> changes *on new nodes in DC3.*
>
>
>
>
> *net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3
> net.ipv4.tcp_keepalive_intvl=10*
> Hope these settings are enough on new nodes from where we are going to
> initiate rebuild/streaming and NOT required on all existing nodes from
> where we are getting data streamed. Am I right ??
>
> Have to see whether it works :( and btw ,you can please through a light on
> this if you have faced such exception in past.
>
> As I mentioned in my last mail, this is the exception we are getting in
> streaming AFTER STREAMING some data.
>
> *java.io.IOException: Connection timed out*
> *        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
> *        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
> *        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
> *        at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
> *        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
> *        at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
> *        at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
> *        at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
> *        at java.lang.Thread.run(Thread.java:745)*
> * INFO [STREAM-OUT-/xxx.xxx.198.191] 2016-09-27 00:28:10,347
> StreamResultFuture.java (line 186) [Stream
> #30852870-8472-11e6-b043-3f260c696828] Session with /xxx.xxx.198.191 is
> complete*
> *ERROR [STREAM-OUT-/xxx.xxx.198.191] 2016-09-27 00:28:10,347
> StreamSession.java (line 461) [Stream
> #30852870-8472-11e6-b043-3f260c696828] Streaming error occurred*
> *java.io.IOException: Broken pipe*
> *        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
> *        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
> *        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
> *        at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
> *        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
> *        at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
> *        at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
> *        at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
> *        at java.lang.Thread.run(Thread.java:745)*
> *ERROR [STREAM-IN-/xxx.xxx.198.191] 2016-09-27 00:28:10,461
> StreamSession.java (line 461) [Stream
> #30852870-8472-11e6-b043-3f260c696828] Streaming error occurred*
> *java.lang.RuntimeException: Outgoing stream handler has been closed*
> *        at
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
> *        at
> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
> *        at
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
> *        at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
> *        at java.lang.Thread.run(Thread.java:745)*
>
> Thanks in advance
> techpyaasa
>
> On Wed, Sep 28, 2016 at 6:09 PM, Alain RODRIGUEZ <arodrime@gmail.com>
> wrote:
>
>> Just saw a very similar question from Laxmikanth (laxmikanth524@gmail.com)
>> on an other thread, with the same logs.
>>
>> Would you mind to avoid splitting multiple threads, to gather up
>> informations so we can better help you from this mailing list?
>>
>> C*heers,
>>
>>
>> 2016-09-28 14:28 GMT+02:00 Alain RODRIGUEZ <arodrime@gmail.com>:
>>
>>> Hi,
>>>
>>> It looks like streams are failing right away when trying to rebuild.
>>>
>>>
>>>    - Could you please share with us the command you used?
>>>
>>>
>>> It should be ran from DC3 servers, after altering keyspace to add
>>> keyspaces to the new datacenter. Is this the way you're doing it?
>>>
>>>    - Are all the nodes using the same version ('nodetool version')?
>>>    - What does 'nodetool status keyspace_name1' output?
>>>    - Are you sure to be using Network Topology Strategy on '*keyspace_name1'?
>>>    *Have you modified this schema to add replications on DC3
>>>
>>> My guess is something could be wrong with the configuration.
>>>
>>> I checked with our network operations team , they have confirmed network
>>>> is stable and no network hiccups.
>>>> I have set 'streaming_socket_timeout_in_ms: 86400000' (24 hours) as
>>>> suggested in datastax blog  - https://support.datastax.com
>>>> /hc/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of
>>>> -streaming-errors-or-failures and ran 'nodetool rebuild' one node at a
>>>> time but was of NO USE . Still we are getting above exception.
>>>>
>>>
>>> This look correct to me, good you added this information, thanks.
>>>
>>> An other thought is I believe you need all the nodes to be up to have
>>> those streams working on the origin DC you use for your 'nodetool rebuild
>>> <origin_dc>' command.
>>>
>>> This look a bit weird, good luck.
>>>
>>> C*heers,
>>> -----------------------
>>> Alain Rodriguez - @arodream - alain@thelastpickle.com
>>> France
>>>
>>> The Last Pickle - Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>>
>>> 2016-09-27 18:54 GMT+02:00 techpyaasa . <techpyaasa@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>> I'm trying to add new data center - DC3 to existing c*-2.0.17 cluster
>>>> with 2 data centers DC1, DC2 with replication DC1:3 , DC2:3 , DC3:3.
>>>>
>>>>  I'm getting following exception repeatedly on new nodes after I run
>>>> 'nodetool rebuild'.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *DEBUG [ScheduledTasks:1] 2016-09-27 04:24:00,416 GCInspector.java
>>>> (line 118) GC for ParNew: 20 ms for 1 collections, 9837479688 used; max is
>>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:03,417
>>>> GCInspector.java (line 118) GC for ParNew: 20 ms for 1 collections,
>>>> 9871193904 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
>>>> 04:24:06,418 GCInspector.java (line 118) GC for ParNew: 20 ms for 1
>>>> collections, 9950298136 used; max is 16760438784DEBUG [ScheduledTasks:1]
>>>> 2016-09-27 04:24:09,419 GCInspector.java (line 118) GC for ParNew: 19 ms
>>>> for 1 collections, 9941119568 used; max is 16760438784DEBUG
>>>> [ScheduledTasks:1] 2016-09-27 04:24:12,421 GCInspector.java (line 118) GC
>>>> for ParNew: 20 ms for 1 collections, 9864185024 used; max is
>>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:15,422
>>>> GCInspector.java (line 118) GC for ParNew: 60 ms for 2 collections,
>>>> 9730374352 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
>>>> 04:24:18,423 GCInspector.java (line 118) GC for ParNew: 18 ms for 1
>>>> collections, 9775448168 used; max is 16760438784DEBUG [ScheduledTasks:1]
>>>> 2016-09-27 04:24:21,424 GCInspector.java (line 118) GC for ParNew: 22 ms
>>>> for 1 collections, 9850794272 used; max is 16760438784DEBUG
>>>> [ScheduledTasks:1] 2016-09-27 04:24:24,425 GCInspector.java (line 118) GC
>>>> for ParNew: 20 ms for 1 collections, 9729992448 <9729992448> used;
max is
>>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:27,426
>>>> GCInspector.java (line 118) GC for ParNew: 22 ms for 1 collections,
>>>> 9699783920 used; max is 16760438784DEBUG [ScheduledTasks:1] 2016-09-27
>>>> 04:24:30,427 GCInspector.java (line 118) GC for ParNew: 21 ms for 1
>>>> collections, 9696523920 used; max is 16760438784DEBUG [ScheduledTasks:1]
>>>> 2016-09-27 04:24:33,429 GCInspector.java (line 118) GC for ParNew: 20 ms
>>>> for 1 collections, 9560497904 used; max is 16760438784DEBUG
>>>> [ScheduledTasks:1] 2016-09-27 04:24:36,430 GCInspector.java (line 118) GC
>>>> for ParNew: 19 ms for 1 collections, 9568718352 <9568718352> used;
max is
>>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:39,431
>>>> GCInspector.java (line 118) GC for ParNew: 22 ms for 1 collections,
>>>> 9496991384 <9496991384> used; max is 16760438784DEBUG [ScheduledTasks:1]
>>>> 2016-09-27 04:24:42,432 GCInspector.java (line 118) GC for ParNew: 19 ms
>>>> for 1 collections, 9486433840 used; max is 16760438784DEBUG
>>>> [ScheduledTasks:1] 2016-09-27 04:24:45,434 GCInspector.java (line 118) GC
>>>> for ParNew: 19 ms for 1 collections, 9442642688 used; max is
>>>> 16760438784DEBUG [ScheduledTasks:1] 2016-09-27 04:24:48,435
>>>> GCInspector.java (line 118) GC for ParNew: 20 ms for 1 collections,
>>>> 9548532008 <9548532008> used; max is 16760438784DEBUG
>>>> [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,756 ConnectionHandler.java
>>>> (line 244) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File
>>>> (Header (cfId: bf446a90-71c5-3552-a2e5-b1b94dbf86e3, #0, version: jb,
>>>> estimated keys: 252928, transfer size: 5496759656, compressed?: true),
>>>> file:
>>>> /home/cassandra/data_directories/data/keyspace_name1/columnfamily_1/keyspace_name1-columnfamily_1-tmp-jb-54-Data.db)DEBUG
>>>> [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,757 ConnectionHandler.java
>>>> (line 310) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Sending Received
>>>> (bf446a90-71c5-3552-a2e5-b1b94dbf86e3, #0)ERROR
>>>> [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,759 StreamSession.java
>>>> (line 461) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error
>>>> occurredjava.io.IOException: Connection timed out    at
>>>> sun.nio.ch.FileDispatcherImpl.write0(Native Method)    at
>>>> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)    at
>>>> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)    at
>>>> sun.nio.ch.IOUtil.write(IOUtil.java:65)    at
>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)    at
>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>> at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>>>> at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)
>>>> at java.lang.Thread.run(Thread.java:745)DEBUG [STREAM-OUT-/xxx.xxx.98.168]
>>>> 2016-09-27 04:24:49,764 ConnectionHandler.java (line 104) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler
on
>>>> /xxx.xxx.98.168 INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>>>> completeERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>> StreamSession.java (line 461) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error
>>>> occurredjava.io.IOException: Broken pipe    at
>>>> sun.nio.ch.FileDispatcherImpl.write0(Native Method)    at
>>>> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)    at
>>>> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)    at
>>>> sun.nio.ch.IOUtil.write(IOUtil.java:65)    at
>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)    at
>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>> at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>>>> at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
>>>> at java.lang.Thread.run(Thread.java:745)DEBUG [STREAM-IN-/xxx.xxx.98.168]
>>>> 2016-09-27 04:24:49,909 ConnectionHandler.java (line 244) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>>>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>>>> 4736, transfer size: 2306880, compressed?: true), file:
>>>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)ERROR
>>>> [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 StreamSession.java
>>>> (line 461) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error
>>>> occurredjava.lang.RuntimeException: Outgoing stream handler has been
>>>> closed    at
>>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)
>>>> at
>>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)
>>>> at
>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)
>>>> at
>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)
>>>> at java.lang.Thread.run(Thread.java:745)*
>>>>
>>>>
>>>> I checked with our network operations team , they have confirmed
>>>> network is stable and no network hiccups.
>>>> I have set 'streaming_socket_timeout_in_ms: 86400000' (24 hours) as
>>>> suggested in datastax blog  - https://support.datastax.com/h
>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>>>> treaming-errors-or-failures and ran 'nodetool rebuild' one node at a
>>>> time but was of NO USE . Still we are getting above exception.
>>>>
>>>> Can someone please help me in debugging and fixing this.
>>>>
>>>>
>>>> Thanks,
>>>> techpyaasa
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message