cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain RODRIGUEZ <arodr...@gmail.com>
Subject Re: Rebuild to a new DC fails every time
Date Wed, 10 Jan 2018 11:56:18 GMT
Hello Martin.

Did you solve your issue?

I would say that this exception could be due to
'streaming_socket_timeout_in_ms'
indeed. Make sure you have a large value enough or indeed upgrade to a
newer version implementing the keep alive is indeed an interesting thing to
try. The thing is if you are trying to add a DC, it might not be the best
moment for an upgrade. It is clear to me that using a keep-alive here is
better, so if it is a good fit upgrading could definitely help.

Another reason I can think of would be network issue of some kind such as a
flaky cross DC connection, a node going down, strictly or just bouncing
because of GC or any other reason. I believe this kind of events are not
well handled by the streaming process yet.

Is the cluster healthy overall? Do you have pending / dropped messages of
some kind, GC pressure, log warnings and errors or any other troubles?

Let us know how it goes :).

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France / Spain

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2018-01-08 14:31 GMT+00:00 Martin Mačura <m.macura@gmail.com>:

> None of the files is listed more than once in the logs:
>
> java.lang.RuntimeException: Transfer of file
> /fs3/cassandra/data/<redacted>/event_group-3b5782d08e4411e68
> 42917253f111990/mc-116042-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs0/cassandra/data/<redacted>/event_group-3b5782d08e4411e68
> 42917253f111990/mc-111370-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs3/cassandra/data/<redacted>/event_alert-13d700008e3f11e6a
> 6cbe1698349da4d/mc-8659-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs4/cassandra/data/<redacted>/event_alert-13d700008e3f11e6a
> 6cbe1698349da4d/mc-9133-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs2/cassandra/data/<redacted>/event_alert-13d700008e3f11e6a
> 6cbe1698349da4d/mc-3997-big-Data.db
> already completed or aborted (perhaps session failed?).
> java.lang.RuntimeException: Transfer of file
> /fs1/cassandra/data/<redacted>//event_group-3b5782d08e4411e6
> 842917253f111990/mc-152979-big-Data.db
> already completed or aborted (perhaps session failed?).
>
>
>
>
> On Mon, Jan 8, 2018 at 2:21 AM, kurt greaves <kurt@instaclustr.com> wrote:
> > If you're on 3.9 it's likely unrelated as streaming_socket_timeout_in_ms
> is
> > 48 hours. Appears rebuild is trying to stream the same file twice. Are
> there
> > other exceptions in the logs related to the file, or can you find out if
> > it's previously been sent by the same session? Search the logs for the
> file
> > that failed and post back any exceptions.
> >
> > On 29 December 2017 at 10:18, Martin Mačura <m.macura@gmail.com> wrote:
> >>
> >> Is this something that can be resolved by CASSANDRA-11841 ?
> >>
> >> Thanks,
> >>
> >> Martin
> >>
> >> On Thu, Dec 21, 2017 at 3:02 PM, Martin Mačura <m.macura@gmail.com>
> wrote:
> >> > Hi all,
> >> > we are trying to add a new datacenter to the existing cluster, but the
> >> > 'nodetool rebuild' command always fails after a couple of hours.
> >> >
> >> > We're on Cassandra 3.9.
> >> >
> >> > Example 1:
> >> >
> >> > 172.24.16.169 INFO  [STREAM-IN-/172.25.16.125:55735] 2017-12-13
> >> > 23:55:38,840 StreamResultFuture.java:174 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
> >> > Receiving 0 files(0.000KiB), sending 9844 files(885.587GiB)
> >> > 172.25.16.125 INFO  [STREAM-IN-/172.24.16.169:7000] 2017-12-13
> >> > 23:55:38,858 StreamResultFuture.java:174 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72 ID#0] Prepare completed.
> >> > Receiving 9844 files(885.587GiB), sending 0 files(0.000KiB)
> >> >
> >> > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:55735] 2017-12-14
> >> > 04:28:09,064 StreamSession.java:533 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
> >> > session with peer 172.25.16.125
> >> > 172.24.16.169 java.io.IOException: Connection reset by peer
> >> >
> >> > 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:49412] 2017-12-14
> >> > 07:26:26,832 StreamSession.java:533 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
> >> > session with peer 172.25.16.125
> >> > 172.24.16.169 java.lang.RuntimeException: Transfer of file
> >> > <redacted>-13d700008e3f11e6a6cbe1698349da4d/mc-8659-big-Data.db
> >> > already completed or aborted (perhaps session failed?).
> >> > 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-14
> >> > 07:26:50,004 StreamSession.java:533 - [Stream
> >> > #b8faf130-e092-11e7-bab5-0d4fb7c90e72] Streaming error occurred on
> >> > session with peer 172.24.16.169
> >> > 172.25.16.125 java.io.IOException: Connection reset by peer
> >> >
> >> > Example 2:
> >> >
> >> > 172.24.16.169 INFO  [STREAM-IN-/172.25.16.125:35202] 2017-12-18
> >> > 03:24:31,423 StreamResultFuture.java:174 - [Stream
> >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed.
> >> > Receiving 0 files(0.000KiB), sending 12312 files(895.973GiB)
> >> > 172.25.16.125 INFO  [STREAM-IN-/172.24.16.169:7000] 2017-12-18
> >> > 03:24:31,441 StreamResultFuture.java:174 - [Stream
> >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af ID#0] Prepare completed.
> >> > Receiving 12312 files(895.973GiB), sending 0 files(0.000KiB)
> >> >
> >> > 172.24.16.169 ERROR [STREAM-IN-/172.25.16.125:35202] 2017-12-18
> >> > 06:39:42,049 StreamSession.java:533 - [Stream
> >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
> >> > session with peer 172.25.16.125
> >> > 172.24.16.169 java.io.IOException: Connection reset by peer
> >> >
> >> > 172.24.16.169 ERROR [STREAM-OUT-/172.25.16.125:42744] 2017-12-18
> >> > 09:25:36,188 StreamSession.java:533 - [Stream
> >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
> >> > session with peer 172.25.16.125
> >> > 172.24.16.169 java.lang.RuntimeException: Transfer of file
> >> > <redacted>-3b5782d08e4411e6842917253f111990/mc-152979-big-Data.db
> >> > already completed or aborted (perhaps session failed?).
> >> > 172.25.16.125 ERROR [STREAM-OUT-/172.24.16.169:7000] 2017-12-18
> >> > 09:25:59,447 StreamSession.java:533 - [Stream
> >> > #95d36300-e3d4-11e7-a90b-2b89506ad2af] Streaming error occurred on
> >> > session with peer 172.24.16.169
> >> > 172.25.16.125 java.io.IOException: Connection timed out
> >> >
> >> > Datacenter: PRIMARY
> >> > ===================
> >> > Status=Up/Down
> >> > |/ State=Normal/Leaving/Joining/Moving
> >> > --  Address        Load       Tokens       Owns (effective)  Host ID
> >> >                             Rack
> >> > UN  172.24.16.169  918.31 GiB  256          100.0%
> >> > bc4a980b-cca6-4ca2-b32f-f8206d48e14c  RAC1
> >> > UN  172.24.16.170  908.76 GiB  256          100.0%
> >> > 37b2742e-c83a-4341-896f-09d244810e69  RAC1
> >> > UN  172.24.16.171  908.44 GiB  256          100.0%
> >> > 6dc2b9d8-75dd-48f8-858c-53b1af42e8fb  RAC1
> >> > Datacenter: SECONDARY
> >> > =====================
> >> > Status=Up/Down
> >> > |/ State=Normal/Leaving/Joining/Moving
> >> > --  Address        Load       Tokens       Owns (effective)  Host ID
> >> >                             Rack
> >> > UN  172.25.16.125  27.48 GiB  256          100.0%
> >> > 1e1669eb-cfd2-4718-a073-558946a8c947  RAC2
> >> > UN  172.25.16.124  28.24 GiB  256          100.0%
> >> > 896d9894-10c8-4269-9476-5ddab3c8abe9  RAC2
> >> >
> >> > Any ideas?
> >> >
> >> > Thanks,
> >> >
> >> > Martin
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: user-help@cassandra.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Mime
View raw message