Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3BD24200B92 for ; Wed, 28 Sep 2016 18:44:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3A5F8160AD3; Wed, 28 Sep 2016 16:44:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id ADD28160AB8 for ; Wed, 28 Sep 2016 18:44:14 +0200 (CEST) Received: (qmail 65459 invoked by uid 500); 28 Sep 2016 16:44:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 65449 invoked by uid 99); 28 Sep 2016 16:44:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Sep 2016 16:44:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 45D861805EA for ; Wed, 28 Sep 2016 16:44:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.88 X-Spam-Level: ** X-Spam-Status: No, score=2.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, HTML_OBFUSCATE_05_10=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id vN3CDUesh5mZ for ; Wed, 28 Sep 2016 16:44:06 +0000 (UTC) Received: from mail-it0-f52.google.com (mail-it0-f52.google.com [209.85.214.52]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 6312C5FC81 for ; Wed, 28 Sep 2016 16:44:06 +0000 (UTC) Received: by mail-it0-f52.google.com with SMTP id 15so22646326ita.1 for ; Wed, 28 Sep 2016 09:44:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=8lf3aYatfRhGGbaV/+PwdRE/ZjX8rzcERnbghU9ZoNw=; b=NykSQ5o+9XzprkfiEPfrGXtTVCCec/97oAfKx/eXxJCk+gwTR2gWoE2u7N9yh6ImuX m7NkHDCmWK2gWomlnVcSMJzIAcpxoW0nab3yPQx8I01a7nkAiCT9QFfO6do7OrHQJVoU jE6i4kpC//w4dq4bvQd1kGVoh4DFEKjmnFGapQAYagnTtkah/u7ayCCJuJiTpqv3gPvf 3kRTC3FaCapIZo9wOmnOJ8Z3zS85D8/E1qq7MKOk+HJ11DTGGuueNbLSrmyM0/3+Onlh /K9MwFzjRSzd9/CuukUMYyQOYxZV+5WBO8WGDA8EGGqXfGtZk3tekZbK4Vsq7b5b6Xgd cSGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=8lf3aYatfRhGGbaV/+PwdRE/ZjX8rzcERnbghU9ZoNw=; b=hke9nU0yhXdfrft5zxr+2GN2RzShq6x3guJXQt2DVoejkT5mLpzzW5GW+1oFAdVGMd IMjDOQ6/MB0SzwtEfbW/V06tk16XFi1DnCUy7oWuAoDqHNpygnAhqvuS69Ll4aqJ1Yz4 FGM/zrtA66m1YSZN8HXxhuNAkm5LV7YaKE58VCQBm3d9MQHl993HnBFaWOaU86lv6dfn y60dlaJ7HDU7CGth6hJkR5eJlcd4fLVgc22KKJuwD1yOmeuiUFCrOscUGTWbZlwHavlk HjJB3BVQQqjplJJShuatP1GhDP6TVAfKMh8kgbkJA7jhB1prawUt/0BAY69EENPpfbbb zgYw== X-Gm-Message-State: AA6/9RkVhgwUdR8ZI5+b8FtHb0+6UgojvQgUkzUvNsylRIbb2lKy4y8jMbIfq/RAfD1PEniHVT68bgDpxEzxfA== X-Received: by 10.36.188.65 with SMTP id n62mr3371034ite.61.1475081045529; Wed, 28 Sep 2016 09:44:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.250.138 with HTTP; Wed, 28 Sep 2016 09:44:04 -0700 (PDT) In-Reply-To: References: From: "techpyaasa ." Date: Wed, 28 Sep 2016 22:14:04 +0530 Message-ID: Subject: Re: New node block in autobootstrap To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=94eb2c111e54bbfe99053d941146 archived-at: Wed, 28 Sep 2016 16:44:16 -0000 --94eb2c111e54bbfe99053d941146 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable @Paulo We have done changes as you said net.ipv4.tcp_keepalive_time=3D60 net.ipv4.tcp_keepalive_probes=3D3 net.ipv4.tcp_keepalive_intvl=3D10 and increased streaming_socket_timeout_in_ms to 48 hours , "phi_convict_threshold : 9". And once again recommissioned new data center (DC3) , ran " nodetool rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild' got exit without any exception. Please check logs below *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571 StorageService.java (line 914) rebuild from dc: IDC* * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520 StreamResultFuture.java (line 87) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild= * * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.75* * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.132* * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.75* * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.133* * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.132* * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.133* * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.167* * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.78* * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.167* * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.78* * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.126* * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.191* * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.126* * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.191* * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.168* * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.169* * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.168* * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.169* * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is complete* * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is complete* * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is complete* * INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is complete* * INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is complete* * INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is complete* * INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is complete* * INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is complete* * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is complete* * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778 StreamResultFuture.java (line 220) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed* As you can see logs above , nodetool rebuild finished w/o data got stremed and all streaming sessions completed WITHIN NOT TIME(See time stamp in logs). And also "nodetool status" seems to be all fine from this new nodes(from which I run 'nodetool rebuild'). Please let us know what could be the issue here. Thanks in advance. On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta wrote: > Yeah this is likely to be caused by idle connections being shut down, so > you may need to update your tcp_keepalive* and/or network/firewall settin= gs. > > > 2016-09-27 15:29 GMT-03:00 laxmikanth sadula : > >> Hi paul, >> >> Thanks for the reply... >> >> I'm getting following streaming exceptions during nodetool rebuild in >> c*-2.0.17 >> >> *04:24:49,759 StreamSession.java (line 461) [Stream >> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >> *java.io.IOException: Connection timed out* >> * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* >> * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* >> * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* >> * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* >> * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* >> * at >> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMe= ssage.java:44)* >> * at >> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.= sendMessage(ConnectionHandler.java:339)* >> * at >> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.= run(ConnectionHandler.java:311)* >> * at java.lang.Thread.run(Thread.java:745)* >> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >> ConnectionHandler.java (line 104) [Stream >> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler= on >> /xxx.xxx.98.168* >> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >> StreamResultFuture.java (line 186) [Stream >> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is >> complete* >> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 >> StreamSession.java (line 461) [Stream >> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >> *java.io.IOException: Broken pipe* >> * at sun.nio.ch.FileDispatcherImpl.write0(Native Method)* >> * at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)* >> * at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)* >> * at sun.nio.ch.IOUtil.write(IOUtil.java:65)* >> * at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)* >> * at >> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMe= ssage.java:44)* >> * at >> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.= sendMessage(ConnectionHandler.java:339)* >> * at >> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.= run(ConnectionHandler.java:319)* >> * at java.lang.Thread.run(Thread.java:745)* >> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 >> ConnectionHandler.java (line 244) [Stream >> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId: >> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys: >> 4736, transfer size: 2306880, compressed?: true), file: >> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/= keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)* >> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 >> StreamSession.java (line 461) [Stream >> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred* >> *java.lang.RuntimeException: Outgoing stream handler has been closed* >> * at >> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionH= andler.java:126)* >> * at >> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:= 524)* >> * at >> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSessi= on.java:413)* >> * at >> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.= run(ConnectionHandler.java:245)* >> * at java.lang.Thread.run(Thread.java:745)* >> >> On Sep 27, 2016 11:48 PM, "Paulo Motta" wrote= : >> >>> What type of streaming timeout are you getting? Do you have a stack >>> trace? What version are you in? >>> >>> See more information about tuning tcp_keepalive* here: >>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble >>> shooting/trblshootIdleFirewall.html >>> >>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula : >>> >>>> @Paulo Motta >>>> >>>> Even we are facing Streaming timeout exceptions during 'nodetool >>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours)= as >>>> suggested in datastax blog - https://support.datastax.com/h >>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s >>>> treaming-errors-or-failures , but still we are getting streaming >>>> exceptions. >>>> >>>> And what is the suggestible settings/values for kernel tcp_keepalive >>>> which would help streaming succeed ? >>>> >>>> Thank you >>>> >>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta >>> > wrote: >>>> >>>>> What version are you in? This seems like a typical case were there wa= s >>>>> a problem with streaming (hanging, etc), do you have access to the lo= gs? >>>>> Maybe look for streaming errors? Typically streaming errors are relat= ed to >>>>> timeouts, so you should review your cassandra >>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings. >>>>> >>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool >>>>> bootstrap resume. There were also some streaming hanging problems fix= ed >>>>> recently, so I'd advise you to upgrade to the latest version of your >>>>> particular series for a more robust version. >>>>> >>>>> Is there any reason why you didn't use the replace procedure >>>>> (-Dreplace_address) to replace the node with the same tokens? This wo= uld be >>>>> a bit faster than remove + bootstrap procedure. >>>>> >>>>> 2016-08-15 15:37 GMT-03:00 J=C3=A9r=C3=B4me Mainaud : >>>>> >>>>>> Hello, >>>>>> >>>>>> A client of mime have problems when adding a node in the cluster. >>>>>> After 4 days, the node is still in joining mode, it doesn't have the >>>>>> same level of load than the other and there seems to be no streaming= from >>>>>> and to the new node. >>>>>> >>>>>> This node has a history. >>>>>> >>>>>> 1. At the begin, it was in a seed in the cluster. >>>>>> 2. Ops detected that client had problems with it. >>>>>> 3. They tried to reset it but failed. In their process they >>>>>> launched several repair and rebuild process on the node. >>>>>> 4. Then they asked me to help them. >>>>>> 5. We stopped the node, >>>>>> 6. removed it from the list of seeds (more precisely it was >>>>>> replaced by another node), >>>>>> 7. removed it from the cluster (I choose not to use decommission >>>>>> since node data was compromised) >>>>>> 8. deleted all files from data, commitlog and savedcache >>>>>> directories. >>>>>> 9. after the leaving process ended, it was started as a fresh new >>>>>> node and began autobootstrap. >>>>>> >>>>>> >>>>>> As I don=E2=80=99t have direct access to the cluster I don't have a = lot of >>>>>> information, but I will have tomorrow (logs and results of some comm= ands). >>>>>> And I can ask for people any required information. >>>>>> >>>>>> Does someone have any idea of what could have happened and what I >>>>>> should investigate first ? >>>>>> What would you do to unlock the situation ? >>>>>> >>>>>> Context: The cluster consists of two DC, each with 15 nodes. Average >>>>>> load is around 3 TB per node. The joining node froze a little after = 2 TB. >>>>>> >>>>>> Thank you for your help. >>>>>> Cheers, >>>>>> >>>>>> >>>>>> -- >>>>>> J=C3=A9r=C3=B4me Mainaud >>>>>> jerome@mainaud.com >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Laxmikanth >>>> 99621 38051 >>>> >>>> >>> > --94eb2c111e54bbfe99053d941146 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
@Paulo

We have done changes as you said
net.ipv4= .tcp_keepalive_time=3D60
net.ipv4.tcp_keepalive_probes=3D3
net.ipv4= .tcp_keepalive_intvl=3D10

and increased=C2=A0streaming_socket_timeou= t_in_ms to 48 hours , "phi_convict_threshold : 9".

And onc= e again recommissioned new data center (DC3) =C2=A0, ran " nodetool re= build 'DC1' " , but this time NO data got streamed and 'no= detool rebuild' got exit without any exception.

Please check log= s below

INFO [RMI TCP Connection(10)-xxx.xxx.12.140] = 2016-09-28 09:18:44,571 StorageService.java (line 914) rebuild from dc: IDC=
=C2=A0INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-0= 9-28 09:18:47,520 StreamResultFuture.java (line 87) [Stream #3a47f8d0-8597-= 11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild
= =C2=A0INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,5= 21 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f674= 4d54a01] Beginning stream session with /xxx.xxx.198.75
=C2= =A0INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522 Str= eamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a0= 1] Beginning stream session with /xxx.xxx.198.132
=C2=A0IN= FO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522 StreamSession.ja= va (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting strea= ming to /xxx.xxx.198.75
=C2=A0INFO [RMI TCP Connection(10)= -xxx.xxx.12.140] 2016-09-28 09:18:47,522 StreamResultFuture.java (line 91) = [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session wit= h /xxx.xxx.198.133
=C2=A0INFO [StreamConnectionEstablisher= :2] 2016-09-28 09:18:47,522 StreamSession.java (line 214) [Stream #3a47f8d0= -8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.132
=C2=A0INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,5= 23 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54= a01] Starting streaming to /xxx.xxx.198.133
=C2=A0INFO [RM= I TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523 StreamResultFu= ture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginnin= g stream session with /xxx.xxx.198.167
=C2=A0INFO [RMI TCP= Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 StreamResultFuture.= java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning str= eam session with /xxx.xxx.198.78
=C2=A0INFO [StreamConnect= ionEstablisher:4] 2016-09-28 09:18:47,524 StreamSession.java (line 214) [St= ream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.= 198.167
=C2=A0INFO [StreamConnectionEstablisher:5] 2016-09= -28 09:18:47,525 StreamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-= bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.78
= =C2=A0INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524 = StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d5= 4a01] Beginning stream session with /xxx.xxx.198.126
=C2= =A0INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525 Str= eamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a0= 1] Beginning stream session with /xxx.xxx.198.191
=C2=A0IN= FO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526 StreamSession.ja= va (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting strea= ming to /xxx.xxx.198.126
=C2=A0INFO [StreamConnectionEstab= lisher:7] 2016-09-28 09:18:47,526 StreamSession.java (line 214) [Stream #3a= 47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /xxx.xxx.198.191<= /i>
=C2=A0INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09= -28 09:18:47,526 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-1= 1e6-bd17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.168
=C2=A0INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 0= 9:18:47,527 StreamResultFuture.java (line 91) [Stream #3a47f8d0-8597-11e6-b= d17-3f6744d54a01] Beginning stream session with /xxx.xxx.198.169
<= div>=C2=A0INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527 S= treamSession.java (line 214) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01]= Starting streaming to /xxx.xxx.198.168
=C2=A0INFO [Stream= ConnectionEstablisher:9] 2016-09-28 09:18:47,528 StreamSession.java (line 2= 14) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to /x= xx.xxx.198.169
=C2=A0INFO [STREAM-IN-/xxx.xxx.198.132] 201= 6-09-28 09:18:47,713 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8= 597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is complete
=C2=A0INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715= StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744= d54a01] Session with /xxx.xxx.198.191 is complete
=C2=A0IN= FO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716 StreamResultFuture.= java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with= /xxx.xxx.198.133 is complete
=C2=A0INFO [STREAM-IN-/xxx.x= xx.198.169] 2016-09-28 09:18:47,716 StreamResultFuture.java (line 186) [Str= eam #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is= complete
=C2=A0INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-= 28 09:18:47,715 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-1= 1e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is complete
<= div>=C2=A0INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715 Stre= amResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a0= 1] Session with /xxx.xxx.198.126 is complete
=C2=A0INFO [S= TREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715 StreamResultFuture.java (= line 186) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.= xxx.198.78 is complete
=C2=A0INFO [STREAM-IN-/xxx.xxx.198.= 168] 2016-09-28 09:18:47,715 StreamResultFuture.java (line 186) [Stream #3a= 47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is comple= te
=C2=A0INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18= :47,776 StreamResultFuture.java (line 186) [Stream #3a47f8d0-8597-11e6-bd17= -3f6744d54a01] Session with /xxx.xxx.198.75 is complete
= =C2=A0INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778 StreamResult= Future.java (line 220) [Stream #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All s= essions completed


As you can see logs above , nodetool rebui= ld finished w/o data got stremed and all streaming sessions completed WITHI= N NOT TIME(See time stamp in logs).


And also "nodetool stat= us" seems to be all fine from this new nodes(from which I run 'nod= etool rebuild').

Please let us know what could be the issue here= .

Thanks in advance.

On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <paul= oricardomg@gmail.com> wrote:
Yeah this is likely to be caused by idle connections bei= ng shut down, so you may need to update your tcp_keepalive* and/or network/= firewall settings.

2016-09-27 15:29 GMT-03:00 laxmikanth sadula <laxmikanth524@gmail.com>:

Hi paul,

Thanks for the reply...

I'm getting following streaming exceptions during nodeto= ol rebuild in c*-2.0.17

04:24:49,759 StreamSession.java (line 461) [Stream #5e1b7= f40-8496-11e6-8847-1b88665e430d] Streaming error occurred
java.io.IOException: Connection timed out
=C2=A0=C2=A0=C2=A0 at sun.nio.ch.FileDispatcherImpl.write0(Native M= ethod)
=C2=A0=C2=A0=C2=A0 at sun.nio.ch.SocketDispatcher.write(SocketDispa= tcher.java:47)
=C2=A0=C2=A0=C2=A0 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUti= l.java:93)
=C2=A0=C2=A0=C2=A0 at sun.nio.ch.IOUtil.write(IOUtil.java:65) =C2=A0=C2=A0=C2=A0 at sun.nio.ch.SocketChannelImpl.write(SocketChan= nelImpl.java:487)
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.messages.Strea= mMessage.serialize(StreamMessage.java:44)
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.ConnectionHand= ler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339= )
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.ConnectionHand= ler$OutgoingMessageHandler.run(ConnectionHandler.java:311) =C2=A0=C2=A0=C2=A0 at java.lang.Thread.run(Thread.java:745)
DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 ConnectionHan= dler.java (line 104) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Cl= osing stream connection handler on /xxx.xxx.98.168
=C2=A0INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 StreamRe= sultFuture.java (line 186) [Stream #5e1b7f40-8496-11e6-8847-1b88665e43= 0d] Session with /xxx.xxx.98.168 is complete
ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764 StreamSession= .java (line 461) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Stream= ing error occurred
java.io.IOException: Broken pipe
=C2=A0=C2=A0=C2=A0 at sun.nio.ch.FileDispatcherImpl.write0(Native M= ethod)
=C2=A0=C2=A0=C2=A0 at sun.nio.ch.SocketDispatcher.write(SocketDispa= tcher.java:47)
=C2=A0=C2=A0=C2=A0 at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUti= l.java:93)
=C2=A0=C2=A0=C2=A0 at sun.nio.ch.IOUtil.write(IOUtil.java:65) =C2=A0=C2=A0=C2=A0 at sun.nio.ch.SocketChannelImpl.write(SocketChan= nelImpl.java:487)
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.messages.Strea= mMessage.serialize(StreamMessage.java:44)
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.ConnectionHand= ler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339= )
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.ConnectionHand= ler$OutgoingMessageHandler.run(ConnectionHandler.java:319) =C2=A0=C2=A0=C2=A0 at java.lang.Thread.run(Thread.java:745)
DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 ConnectionHand= ler.java (line 244) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Rec= eived File (Header (cfId: 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, ve= rsion: jb, estimated keys: 4736, transfer size: 2306880, compressed?: true)= , file: /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.d= b)
ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909 StreamSession.= java (line 461) [Stream #5e1b7f40-8496-11e6-8847-1b88665e430d] Streami= ng error occurred
java.lang.RuntimeException: Outgoing stream handler has been closed
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.ConnectionHand= ler.sendMessage(ConnectionHandler.java:126)
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.StreamSession.= receive(StreamSession.java:524)
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.StreamSession.= messageReceived(StreamSession.java:413)
=C2=A0=C2=A0=C2=A0 at org.apache.cassandra.streaming.ConnectionHand= ler$IncomingMessageHandler.run(ConnectionHandler.java:245) =C2=A0=C2=A0=C2=A0 at java.lang.Thread.run(Thread.java:745)


On Sep 27, 2016 1= 1:48 PM, "Paulo Motta" <pauloricardomg@gmail.com> wrote:
What type= of streaming timeout are you getting? Do you have a stack trace? What vers= ion are you in?

See more information about tuning tcp_keepaliv= e* here: https://docs.da= tastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootId= leFirewall.html

2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmi= kanth524@gmail.com>:
@Paulo Motta

Even we are facing= Streaming timeout exceptions during 'nodetool rebuild' , I set str= eaming_socket_timeout_in_ms to 86400000 (24 hours) as suggested in datastax= blog=C2=A0 - https://support.datastax.com/hc/en-us/articles/206502913-F= AQ-How-to-reduce-the-impact-of-streaming-errors-or-failures= =C2=A0 , but still we are getting streaming exceptions.

= And what is the suggestible settings/values for kernel tcp_keepalive which = would help streaming succeed ?

Thank you

On Tue, Aug 16, 2= 016 at 12:21 AM, Paulo Motta <pauloricardomg@gmail.com> wrote:
What vers= ion are you in? This seems like a typical case were there was a problem wit= h streaming (hanging, etc), do you have access to the logs? Maybe look for = streaming errors? Typically streaming errors are related to timeouts, so yo= u should review your cassandra streaming_socket_timeout_in_ms and kernel tc= p_keepalive settings.

If you're on 2.2+ you can resume a failed= bootstrap with nodetool bootstrap resume. There were also some streaming h= anging problems fixed recently, so I'd advise you to upgrade to the lat= est version of your particular series for a more robust version.

Is there any reason why you didn't use the replace procedure (-Drepl= ace_address) to replace the node with the same tokens? This would be a bit = faster than remove + bootstrap procedure.

2016-08-15 15:37 GMT-03:00 J=C3= =A9r=C3=B4me Mainaud <jerome@mainaud.com>:
Hello,

<= /div>A client of mime have problems when adding a node in the cluster.
<= /div>After 4 days, the node is still in joining mode, it doesn't have t= he same level of load than the other and there seems to be no streaming fro= m and to the new node.

This node has a history.
    <= li>At the begin, it was in a seed in the cluster.
  1. Ops detected tha= t client had problems with it.
  2. They tried to reset it but failed. = In their process they launched several repair and rebuild process on the no= de.
  3. Then they asked me to help them.
  4. We stopped the node,=
  5. removed it from the list of seeds (more precisely it was replaced= by another node),
  6. removed it from the cluster (I choose not to u= se decommission since node data was compromised)
  7. deleted all files= from data, commitlog and savedcache directories.
  8. after the leavin= g process ended, it was started as a fresh new node and began autobootstrap= .

As I don=E2=80=99t have direct access to t= he cluster I don't have a lot of information, but I will have tomorrow = (logs and results of some commands). And I can ask for people any required = information.

Does someone have any idea of what cou= ld have happened and what I should investigate first ?
What would = you do to unlock the situation ?

Conte= xt: The cluster consists of two DC, each with 15 nodes. Average load is aro= und 3 TB per node. The joining node froze a little after 2 TB.

Thank you for your help.
Cheers,


-= -
J=C3=A9r=C3=B4me Mainaud
jerome@mainaud.com




= --
Regards,
Laxmikanth
99621 38051



--94eb2c111e54bbfe99053d941146--