Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EE1C5200BA4 for ; Sat, 15 Oct 2016 18:58:40 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id ECB9B160AF1; Sat, 15 Oct 2016 16:58:40 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E9BC8160AD9 for ; Sat, 15 Oct 2016 18:58:39 +0200 (CEST) Received: (qmail 91472 invoked by uid 500); 15 Oct 2016 16:58:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 91462 invoked by uid 99); 15 Oct 2016 16:58:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 15 Oct 2016 16:58:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 052E91A060A for ; Sat, 15 Oct 2016 16:58:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.449 X-Spam-Level: * X-Spam-Status: No, score=1.449 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.co.in Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 8asioB-HG33X for ; Sat, 15 Oct 2016 16:58:34 +0000 (UTC) Received: from nm2-vm7.bullet.mail.sg3.yahoo.com (nm2-vm7.bullet.mail.sg3.yahoo.com [106.10.148.110]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 373FA5F613 for ; Sat, 15 Oct 2016 16:58:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.co.in; s=s2048; t=1476550704; bh=ai74HFqy9kycY2VjRv+e0YKxV5pboBDaY8DBSqBtK/I=; h=Date:From:Subject:To:In-Reply-To:From:Subject; b=J+s71mnBXhHPOVdoP666Vl2pnCJYpwRdlTP9puTmIrHljRyfxcfOe7/wxAKOIm2POW8x4HAs+1cN5FmF5tRiFL+SeKq8rB0KIbWHir1P4cNxMjwmOxTWJ3IKmV9IcJiVzo6Wkz0ZqW0pipL/E3fbuY7xnfQEA+TouwAix/0sKZKr4TVG6qY03YSodcS6e5KeO1su/TUdaAALyRxNM7sL4g1BZGaw8ZwIV+QXAMjItqbFR6RjHu0I+jIJFhQQTK/9sftf+i3ik9sQgBRWL00Azi5ceArOmc1imThsGcc5xRsrfrjocMB0Oa6qy64eKqQYz+NcJne5AaEo1wIQcpVJcg== Received: from [106.10.166.118] by nm2.bullet.mail.sg3.yahoo.com with NNFMP; 15 Oct 2016 16:58:24 -0000 Received: from [106.10.151.170] by tm7.bullet.mail.sg3.yahoo.com with NNFMP; 15 Oct 2016 16:58:24 -0000 Received: from [127.0.0.1] by omp1010.mail.sg3.yahoo.com with NNFMP; 15 Oct 2016 16:58:24 -0000 X-Yahoo-Newman-Property: ymail-5 X-Yahoo-Newman-Id: 601836.93298.bm@omp1010.mail.sg3.yahoo.com Received: (qmail 7871 invoked by uid 60001); 15 Oct 2016 16:58:24 -0000 X-YMail-OSG: qY4zwQ8VM1m3Zu.jX9r0ur2h0corZ9ma133oz58HuTHNsX0 HOZatkBoGgno8PGgIVcaq3PlHa0nrn1tTRqSDnjneoEXQ626VEnVWucyFjZj nl0xX1I30cXcZi_iorFeW5mC76vlu_7jnN6FlfWXVFlVccxe5iORgkOI0XkG eLkodHt6Yqj0H6YFPJG7UrdQ7ADLRlNx9RO2m4OdTyM9OMtVHnJPKDyEAW_Z EwtWVUHRc4pFwHIZRPM5FzfDsX0jurwzWtpKEspO.8OY5uDQadZXqye8A18P RbTmWoY6OlNhlOuTg31g6VUyOJX6evSnw3uXpmMUjtAjGVjK_5USFmvOTYVZ u9QaZBMyjnfFzCkM4k5x3KpPyJUGOW6jEHsl.ELgmViohAUu_9MI_b5T8PKl hHoUZ42eDN6U6fGmARhvj0SE18pEJetr1wj9zH8hlHHHSINUAfvB7kUnRrIN yu2EvWn4Flwu5PA4m1ub2a4oNxksEK8vi56vmKY9c7MBu5OoDuugm.DutTlz bzHSmqpg47zBfsvxPgNEWo4_6nmtNRTLdq1b4xAdqnLOXulAyqIWxGh5mAnL _Gdw3s5Lvzox_9E4gmVw- Received: from [106.218.4.99] by web192901.mail.sg3.yahoo.com via HTTP; Sun, 16 Oct 2016 00:58:24 SGT X-Rocket-MIMEInfo: 002.001,SGkgTGVlbmEsCgpEbyB5b3UgaGF2ZSBhIGZpcmV3YWxsIGJldHdlZW4gdGhlIHR3byBEQ3M_IElmIHllcywgJnF1b3Q7Y29ubmVjdGlvbiByZXNldCZxdW90OyBjYW4gYmUgY2F1c2VkIGJ5IENhc3NhbmRyYSB0cnlpbmcgdG8gdXNlIGEgVENQIGNvbm5lY3Rpb24gd2hpY2ggaXMgYWxyZWFkeSBjbG9zZWQgYnkgdGhlIGZpcmV3YWxsLiBQbGVhc2UgbWFrZSBzdXJlIHRoYXQgeW91IHNldCBoaWdoIGNvbm5lY3Rpb24gdGltZW91dCBhdCBmaXJld2FsbC4gQWxzbywgbWFrZSBzdXJlIHlvdXIgc2VydmVycyBhcmUBMAEBAQE- X-Mailer: YahooMailMobile/0.0 YahooMailWebService/0.8.203.830 Message-ID: <1476550704.78559.YahooMailMobile@web192901.mail.sg3.yahoo.com> Date: Sun, 16 Oct 2016 00:58:24 +0800 From: Anuj Wadehra Subject: Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr To: user In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-296771632-438675444-1476550704=:78559" archived-at: Sat, 15 Oct 2016 16:58:41 -0000 ---296771632-438675444-1476550704=:78559 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Hi Leena,=0A=0ADo you have a firewall between the two DCs? If yes, "co= nnection reset" can be caused by Cassandra trying to use a TCP connect= ion which is already closed by the firewall. Please make sure that you set = high connection timeout at firewall. Also, make sure your servers are not o= verloaded. Please see https://developer.ibm.com/answers/questions/231996/wh= y-do-we-get-the-error-connection-reset-by-peer-d.html=0A=0Afor general caus= es of connection reset. Also, as I told earlier, Cassandra troubleshooting = explains it well https://docs.datastax.com/en/cassandra/2.0/cassandra/troub= leshooting/trblshootIdleFirewall.html . Make sure firewall and node tcp set= tings are in sync such that nodes close a tcp connection before firewall do= es that.=0A=0AWith firewall timeout, we generally see merkle tree request/r= esponse failing between nodes in two DCs and then repair is hung for ever. = Not sure how merkle tree creation which is node specific would get impacte= d by multi dc setup. Are repairs with -local options completing without pro= blems?=0A=0AThanks=0AAnuj=0A ---296771632-438675444-1476550704=:78559 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
=0A
=0A =
=0A
=0A =0A
= =0A =0A From:=0A = =0A Leena Ghatpande <lghatpande@hotmail.com&g= t;;
=0A =0A = To:=0A= =0A user@cassand= ra.apache.org <user@cassandra.apache.org>; =
=0A = =0A Subject:=0A =0A = Re: Repair in Multi Datacenter - Should you use -dc Datacenter = repair or repair with -pr
=0A = =0A Sent:=0A =0A = Fri, Oct 14, 2016 2:44:27 PM
=0A=
=0A
=0A =
Hi Leena,

Do you have a firewall between the two DCs? I= f yes, "connection reset" can be caused by Cassandra trying to use a TCP co= nnection which is already closed by the firewall. Please make sure that you= set high connection timeout at firewall. Also, make sure your servers are = not overloaded. Please see https://developer.ibm.com/answers/questions/2319= 96/why-do-we-get-the-error-connection-reset-by-peer-d.html

for g= eneral causes of connection reset. Also, as I told earlier, Cassandra troub= leshooting explains it well https://docs.datastax.com/en/cassandra/2.0/cass= andra/troubleshooting/trblshootIdleFirewall.html . Make sure firewall and n= ode tcp settings are in sync such that nodes close a tcp connection before = firewall does that.

With firewall timeout, we generally see merk= le tree request/response failing between nodes in two DCs and then repair i= s hung for ever. Not sure how merkle tree creation which is node specific w= ould get impacted by multi dc setup. Are repairs with -local options comple= ting without problems?

Thanks
Anuj
=0A =0A = =0A
= =0A
=0A

Thank you for the = update.

=0A


=0A

=0A

The repair fails with the = Error 'Failed Creating merkle tree' but does not give any additiona= l details.=0A
=0A

=0A


=0A

=0A<= p>With -pr running on all DC nodes, we see a peer connection reset error, w= hich then results in hanged repair process even though the TCP connection s= ettings looks good on all nodes.
=0A

=0A
=0A
=0A
=
=0A
=0A
Fr= om: Anuj Wadehra <anujw_2003@yahoo.co.in>
=0ASent: Wednesday, October 12, 2016 2:41 PM
=0ATo:<= /b> user
=0ASubject: Re: Repair in Multi Datacente= r - Should you use -dc Datacenter repair or repair with -pr
=0A
= =A0
=0A
=0A
=0A
=0AHi Leena,
=0A
=0AFirst thing you shoul= d be concerned about is : Why the repair -pr operation doesnt complete ?=0ASecond comes the question : Which repair option is best?=
=0A
=0A
=0AOne proba= ble cause of stuck repairs is : if the firewall between DCs is closing TCP = connections and Cassandra is trying to use such connections, repairs will h= ang. Please refer https://docs.datastax.com/en/cassandra/2.0/cassandra/trou= bleshooting/trblshootIdleFirewall.html=0A . We faced that.
=0A
=0AAlso make sure you comply with basic bandwidth r= equirement between DCs. Recommended is 1000 Mb/s (1 gigabit) or greater.=0A
=0AAnswers for specific questions:=0A1.As per my understanding, all replicas will not partici= pate in dc local repairs and thus repair would be ineffective. You need to = make sure that all replicas of a data in all dcs are in sync.
=0A
=0A2. Every DC is not a ring. All DCs together f= orm a token ring. So, I think yes you should run repair -pr on all nodes.=0A
=0A3. Yes. I dont have experience wi= th incremental repairs. But you can run repair -pr on all nodes of all DCs.=
=0A
=0ARegarding Best approach of repa= ir, you should see some repair presentations of Cassandra Summit 2016. All = are online now.
=0A
=0AI attended the s= ummit and people using large clusters generally use sub range repairs to re= pair their clusters. But such large deployments are on older Cassandra vers= ions and these deployments generally dont use vnodes. So people know easily= which nodes hold=0A which token range.
=0A
=0A
=0A
=0AThanks
=0AAnuj
=0A=0A
=0A
=0A

=0A
=0A
=0A
=0AFrom:<= /span> Leena Ghatpande <lghatpande@hotmail.com>;=0A
=0ATo: user@cassandra.= apache.org <user@cassandra.apache.org>;=0A
=0ASubject: Repair in Multi Datacen= ter - Should you use -dc Datacenter repair or repair with -pr=0A
=0ASent: Wed, Oct= 12, 2016 2:15:51 PM
=0A
=0A
=0A<= table cellpadding=3D"0" cellspacing=3D"0" border=3D"0">
=0A
=0A

Please advice. Cannot find any clear documentatio= n on what is the best strategy for repairing nodes on a regular basis with = multiple datacenters involved.

=0A


=0A

= =0A

We are running cassandra 3.7 in multi datacenter=A0with 4 nodes in ea= ch data center. We are trying to run repairs every other night to keep the = nodes in good state.We currently run repair with -pr option , but the repai= r process gets hung and does not complete=0A gracefully. Dont see any error= s in the logs either.

=0A


=0A

=0A

What is the= best way to perform repairs on multiple data centers on large tables.
=0A

=0A

1. Can we run Datacenter repair using -dc option= for each data center? Do we need to run repair on each node in that case o= r will it repair=A0all nodes within the datacenter?

=0A

2. Is running = repair with -pr across all nodes required , if we perform the step 1 every = night?

=0A

3. Is cross data center repair required and if so whats the= best option?

=0A


=0A

=0A

Thanks

=0A


=0A

=0A

Leena
=0A

=0A


=0A

=0A


=0A

=0A
=0A
=0A=0A=0A=0A=0A=0A=0A= =0A =0A = =0A =0A = =0A =0A =0A ---296771632-438675444-1476550704=:78559--