cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacob Shadix <jacobsha...@gmail.com>
Subject Re: cassandra node stops streaming data during nodetool rebuild
Date Fri, 07 Apr 2017 15:11:46 GMT
I don't see an issue with the size of the data / node. You can attempt the
rebuild again and play around with throughput if your network can handle it.

It can be changed on-the-fly with nodetool:

 nodetool setstreamthroughput

This article is also worth a read -
https://support.datastax.com/hc/en-us/articles/205409646-How-to-performance-tune-data-streaming-activities-like-repair-and-bootstrap

-- Jacob Shadix

On Fri, Apr 7, 2017 at 9:23 AM, Roland Otta <Roland.Otta@willhaben.at>
wrote:

> good point!
>
> on the source side i can see the following error
>
> ERROR [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> StreamSession.java:529 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Streaming error occurred on session with peer 10.192.116.1 through 192.168.
> 0.114
> org.apache.cassandra.io.FSReadError: java.io.IOException: Broken pipe
>         at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:145)
> ~[apache-cassandra-3.7.jar:3.7]
>         at org.apache.cassandra.streaming.compress.
> CompressedStreamWriter.lambda$write$0(CompressedStreamWriter.java:90)
> ~[apache-cassandra-3.7.jar:3.7]
>         at org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.
> applyToChannel(BufferedDataOutputStreamPlus.java:350)
> ~[apache-cassandra-3.7.jar:3.7]
>         at org.apache.cassandra.streaming.compress.
> CompressedStreamWriter.write(CompressedStreamWriter.java:90)
> ~[apache-cassandra-3.7.jar:3.7]
>         at org.apache.cassandra.streaming.messages.
> OutgoingFileMessage.serialize(OutgoingFileMessage.java:91)
> ~[apache-cassandra-3.7.jar:3.7]
>         at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
> serialize(OutgoingFileMessage.java:48) ~[apache-cassandra-3.7.jar:3.7]
>         at org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.
> serialize(OutgoingFileMessage.java:40) ~[apache-cassandra-3.7.jar:3.7]
>         at org.apache.cassandra.streaming.messages.
> StreamMessage.serialize(StreamMessage.java:48)
> ~[apache-cassandra-3.7.jar:3.7]
>         at org.apache.cassandra.streaming.ConnectionHandler$
> OutgoingMessageHandler.sendMessage(ConnectionHandler.java:370)
> ~[apache-cassandra-3.7.jar:3.7]
>         at org.apache.cassandra.streaming.ConnectionHandler$
> OutgoingMessageHandler.run(ConnectionHandler.java:342)
> ~[apache-cassandra-3.7.jar:3.7]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Caused by: java.io.IOException: Broken pipe
>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
> ~[na:1.8.0_77]
>         at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
> ~[na:1.8.0_77]
>         at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
> ~[na:1.8.0_77]
>         at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608)
> ~[na:1.8.0_77]
>         at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:141)
> ~[apache-cassandra-3.7.jar:3.7]
>         ... 10 common frames omitted
> DEBUG [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> ConnectionHandler.java:110 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Closing stream connection handler on /10.192.116.1
> INFO  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> StreamResultFuture.java:187 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Session with /10.192.116.1 is complete
> WARN  [STREAM-OUT-/192.168.0.114:34094] 2017-04-06 17:18:56,532
> StreamResultFuture.java:214 - [Stream #41606030-1ad9-11e7-9f16-51230e2be4e9]
> Stream failed
>
>
> the dataset is approx 300GB / Node.
>
> does that mean that cassandra does not try to reconnect (for streaming) in
> case of short network dropouts?
>
> On Fri, 2017-04-07 at 08:53 -0400, Jacob Shadix wrote:
>
> Did you look at the logs on the source DC as well? How big is the dataset?
>
> -- Jacob Shadix
>
> On Fri, Apr 7, 2017 at 7:16 AM, Roland Otta <Roland.Otta@willhaben.at>
> wrote:
>
> Hi!
>
> we are on 3.7.
>
> we have some debug messages ... but i guess they are not related to that
> issue
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,440 FailureDetector.java:456 -
> Ignoring interval time of 2002469610 for /192.168.0.27
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2598593732 for /10.192.116.4
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002612298 for /10.192.116.5
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,441 FailureDetector.java:456 -
> Ignoring interval time of 2002660534 for /10.192.116.9
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027212880 for /10.192.116.3
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027279042 for /192.168.0.188
> DEBUG [GossipStage:1] 2017-04-07 13:11:00,465 FailureDetector.java:456 -
> Ignoring interval time of 2027313992 for /10.192.116.10
>
> beside that the debug.log is clean
>
> all the mentioned cassandra.yml parameters are the shipped defaults (
> streaming_socket_timeout_in_ms does not exist at all in my cassandra.yml)
> i also checked the pending compactions. there are no pending compactions
> at the moment.
>
> bg - roland otta
>
> On Fri, 2017-04-07 at 06:47 -0400, Jacob Shadix wrote:
>
> What version are you running? Do you see any errors in the system.log
> (SocketTimeout, for instance)?
>
> And what values do you have for the following in cassandra.yaml:
> - - stream_throughput_outbound_megabits_per_sec
> - - compaction_throughput_mb_per_sec
> - - streaming_socket_timeout_in_ms
>
> -- Jacob Shadix
>
> On Fri, Apr 7, 2017 at 6:00 AM, Roland Otta <Roland.Otta@willhaben.at>
> wrote:
>
> hi,
>
> we are trying to setup a new datacenter and are initalizing the data
> with nodetool rebuild.
>
> after some hours it seems that the node stopped streaming (at least
> there is no more streaming traffic on the network interface).
>
> nodetool netstats shows that the streaming is still in progress
>
> Mode: NORMAL
> Bootstrap 6918dc90-1ad6-11e7-9f16-51230e2be4e9
> Rebuild 41606030-1ad9-11e7-9f16-51230e2be4e9
>     /192.168.0.26
>         Receiving 257 files, 145444246572 bytes total. Already received
> 1 files, 1744027 bytes total
>             bds/adcounter_total 76456/47310255 bytes(0%) received from
> idx:0/192.168.0.26
>             bds/upselling_event 1667571/1667571 bytes(100%) received
> from idx:0/192.168.0.26
>     /192.168.0.188
>     /192.168.0.27
>         Receiving 169 files, 79355302464 bytes total. Already received
> 1 files, 81585975 bytes total
>             bds/ad_event_history 81585975/81585975 bytes(100%) received
> from idx:0/192.168.0.27
>     /192.168.0.189
>         Receiving 140 files, 19673034809 bytes total. Already received
> 1 files, 5996604 bytes total
>             bds/adcounter_per_day 5956840/42259846 bytes(14%) received
> from idx:0/192.168.0.189
>             bds/user_event 39764/39764 bytes(100%) received from
> idx:0/192.168.0.189
> Read Repair Statistics:
> Attempted: 0
> Mismatch (Blocking): 0
> Mismatch (Background): 0
> Pool Name                    Active   Pending      Completed   Dropped
> Large messages                  n/a         2              3         0
> Small messages                  n/a         0       68632465         0
> Gossip messages                 n/a         0         217661         0
>
>
>
> it is in that state for approx 15 hours now
>
> does it make sense waiting for the streaming to finish or do i have to
> restart the node, discard data and restart the rebuild?
>
>
>
>
>

Mime
View raw message