From user-return-37288-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Oct 29 11:33:27 2013 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2F5C710EC3 for ; Tue, 29 Oct 2013 11:33:27 +0000 (UTC) Received: (qmail 25518 invoked by uid 500); 29 Oct 2013 11:33:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 25177 invoked by uid 500); 29 Oct 2013 11:33:21 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 25162 invoked by uid 99); 29 Oct 2013 11:33:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Oct 2013 11:33:20 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of baskar.duraikannu@outlook.com designates 65.54.190.146 as permitted sender) Received: from [65.54.190.146] (HELO bay0-omc3-s8.bay0.hotmail.com) (65.54.190.146) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Oct 2013 11:33:13 +0000 Received: from BAY407-EAS427 ([65.54.190.188]) by bay0-omc3-s8.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 29 Oct 2013 04:32:49 -0700 X-TMN: [pCPsBmbhXwg9DKkTy4dBRBr7ObQHeFet] X-Originating-Email: [baskar.duraikannu@outlook.com] Message-ID: Content-Type: multipart/alternative; boundary="_8e61837c-305a-4fd5-8de7-2683fedee961_" MIME-Version: 1.0 To: "user@cassandra.apache.org" From: Baskar Duraikannu Subject: RE: Read repair Date: Tue, 29 Oct 2013 07:32:11 -0400 X-OriginalArrivalTime: 29 Oct 2013 11:32:49.0936 (UTC) FILETIME=[991F9900:01CED49A] X-Virus-Checked: Checked by ClamAV on apache.org --_8e61837c-305a-4fd5-8de7-2683fedee961_ Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Aaron Rack1 goes down and some writes happen in quorum against rack 2 and 3. Hint= ed handoff is set to 30 mins. After couple of hours rack1 comes back and ra= ck2 goes down. Hinted handoff will play but will not cover all of the write= s because of 30 min setting. Now for rows inserted for about 1 hour and 30 = mins=2C there is no quorum until failed rack comes back up. Hope this explains the scenario. ________________________________ From: Aaron Morton Sent: =E2=80=8E10/=E2=80=8E28/=E2=80=8E2013 2:42 AM To: Cassandra User Subject: Re: Read repair > As soon as it came back up=2C due to some human error=2C rack1 goes down.= Now for some rows it is possible that Quorum cannot be established. Not sure I follow here. if the first rack has come up I assume all nodes are available=2C if you th= en lose a different rack I assume you have 2/3 of the nodes available and w= ould be able to achieve a QUORUM. > Just to minimize the issues=2C we are thinking of running read repair man= ually every night. If you are reading and writing at QUORUM and the cluster does not have a QU= ORUM of nodes available writes will not be processed. During reads any mism= atch between the data returned from the nodes will be detected and resolved= before returning to the client. Read Repair is an automatic process that reads from more nodes than necessa= ry and resolves the differences in the back ground. I would run nodetool repair / Anti Entropy as normal=2C once on every machi= ne every gc_grace_seconds. If you have a while rack fail for run repair on = the nodes in the rack if you want to get it back to consistency quickly. Th= e need to do that depends on the config for Hinted Handoff=2C read_repair_c= hance=2C Consistency level=2C the write load=2C and (to some degree) the nu= mber of nodes. If you want to be extra safe just run it. Cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 26/10/2013=2C at 2:54 pm=2C Baskar Duraikannu wrote: > We are thinking through the deployment architecture for our Cassandra clu= ster. Let us say that we choose to deploy data across three racks. > > If let us say that one rack power went down for 10 mins and then it came = back. As soon as it came back up=2C due to some human error=2C rack1 goes d= own. Now for some rows it is possible that Quorum cannot be established. Ju= st to minimize the issues=2C we are thinking of running read repair manuall= y every night. > > Is this a good idea? How often do you perform read repair on your cluster= ? > --_8e61837c-305a-4fd5-8de7-2683fedee961_ Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"
Aaron
Rack1 goes down and some writes happen in quorum against rack 2 and 3. Hint= ed handoff is set to 30 mins. After couple of hours rack1 comes back and ra= ck2 goes down. Hinted handoff will play but will not cover all of the write= s because of 30 min setting. Now for rows inserted for about 1 hour and 30 mins=2C there is no quorum until= failed rack comes back up.

Hope this explains the scenario.

From: Aaron Morton
Sent: =E2=80=8E10/=E2=80=8E28/=E2=80=8E2013 2:42 AM
To: Cassandra User
Subject: Re: Read repair

As soon a= s it came back up=2C due to some human error=2C rack1 goes down. Now for so= me rows it is possible that Quorum cannot be established. =3B
Not sure I follow here. =3B

if the first rack has come up I assume all nodes are available=2C if y= ou then lose a different rack I assume you have 2/3 of the nodes available = and would be able to achieve a QUORUM. =3B

Just to m= inimize the issues=2C we are thinking of running read repair manually every= night. =3B
If you are reading and writing at QUORUM and the cluster does not have a QU= ORUM of nodes available writes will not be processed. During reads any mism= atch between the data returned from the nodes will be detected and resolved= before returning to the client. =3B

Read Repair is an automatic process that reads from more nodes than ne= cessary and resolves the differences in the back ground. =3B

I would run nodetool repair / Anti Entropy as normal=2C once on every = machine every gc_grace_seconds. If you have a while rack fail for run repai= r on the nodes in the rack if you want to get it back to consistency quickl= y. The need to do that depends on the config for Hinted Handoff=2C read_repair_chance=2C Consistency level= =2C the write load=2C and (to some degree) the number of nodes. If you want= to be extra safe just run it. =3B

Cheers
 =3B
-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder &=3B Principal Consultant
Apache Cassandra Consulting


We are th= inking through the deployment architecture for our Cassandra cluster. = =3B Let us say that we choose to deploy data across three racks.

If let us say that one rack power went down for 10 mins and then it came ba= ck. As soon as it came back up=2C due to some human error=2C rack1 goes dow= n. Now for some rows it is possible that Quorum cannot be established. Just= to minimize the issues=2C we are thinking of running read repair manually every night.

Is this a good idea? How often do you perform read repair on your cluster?<= br>

--_8e61837c-305a-4fd5-8de7-2683fedee961_--