Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 5192 invoked from network); 14 Sep 2010 08:55:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Sep 2010 08:55:47 -0000 Received: (qmail 67560 invoked by uid 500); 14 Sep 2010 08:55:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 67458 invoked by uid 500); 14 Sep 2010 08:55:43 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 67450 invoked by uid 99); 14 Sep 2010 08:55:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Sep 2010 08:55:42 +0000 X-ASF-Spam-Status: No, hits=1.0 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of sylvain@yakaz.com does not designate 209.85.216.172 as permitted sender) Received: from [209.85.216.172] (HELO mail-qy0-f172.google.com) (209.85.216.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Sep 2010 08:55:35 +0000 Received: by qyk1 with SMTP id 1so2734619qyk.10 for ; Tue, 14 Sep 2010 01:55:11 -0700 (PDT) Received: by 10.224.28.77 with SMTP id l13mr2339048qac.375.1284454511342; Tue, 14 Sep 2010 01:55:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.86.135 with HTTP; Tue, 14 Sep 2010 01:54:50 -0700 (PDT) In-Reply-To: <634087A402B83643BAD920A30B0F1D5504F0C158@seamonkey.exchange.cognito.co.uk> References: <634087A402B83643BAD920A30B0F1D5504F0C158@seamonkey.exchange.cognito.co.uk> From: Sylvain Lebresne Date: Tue, 14 Sep 2010 10:54:50 +0200 Message-ID: Subject: Re: UnavailableException with 3 nodes and RF=2 To: user@cassandra.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Sep 14, 2010 at 10:43 AM, Chris Jansen wrote: > Hi All, > > > > I=92m a newbie to Cassandra so I could have a configuration issue here, I= am > using the latest stable release 0.6.0. > > > > I have created a cluster of 3 nodes, a keyspace with RF=3D2 and a rack un= aware > replication strategy. When I write with CL=3DQUORUM with all 3 nodes comm= it > the data fine, but when I write with the same CL with one of the nodes do= wn > I see an UnavailableException thrown. Surely if one of the nodes in the > cluster is down another should acknowledge the writes and maintain the > quorum, or is there something that I have misunderstood? From what I > understand, in this case with a RF=3D2 for the quorum writes to succeed I= need > two nodes to acknowledge the write (RF/2+1), which I have. RF=3D2 means that each row is replicated on 2 of your nodes. As you said, Quorum is then 2. This means that for a quorum operation to succeed, you need that the 2 nodes out of the 2 that holds the row (*not* 2 out of all the nodes) be alive. To say it otherwise, if *any* of your node is dead, some operation will fail with unavailable exception. That is, quorum support a node being down = only starting at RF=3D3. > > > > Here is how the cluster looks when quorum writes succeed: > > > > 192.168.245.2 Up=A0=A0=A0=A0=A0=A0=A0=A0 477.33 KB > 78502309573904554351249603414557542595=A0=A0=A0=A0 |<--| > > 192.168.245.4 Up=A0=A0=A0=A0=A0=A0=A0=A0 426.74 KB > 139625953069891725539207365034742863768=A0=A0=A0 |=A0=A0 | > > 192.168.245.1 Up=A0=A0=A0=A0=A0=A0=A0=A0 496.67 KB > 163572901304139170217093255272499595459=A0=A0=A0 |-->| > > > > This is how it looks with one node down and quorum writes fail (I am writ= ing > to 192.168.245.1): > > > > 192.168.245.2 Down=A0=A0=A0=A0=A0=A0 423.58 KB > =A078502309573904554351249603414557542595=A0=A0=A0=A0 |<--| > > 192.168.245.4 Up=A0=A0=A0=A0=A0=A0=A0=A0 426.74 KB > 139625953069891725539207365034742863768=A0=A0=A0 |=A0=A0 | > > 192.168.245.1 Up=A0=A0=A0=A0=A0=A0=A0=A0 496.67 KB > 163572901304139170217093255272499595459=A0=A0=A0 |-->| > > > > Here is the exception that is thrown: > > > > Cannot write: 9e48b039-7687-4b14-9b40-0096b15fd7b0 RETRYING > > UnavailableException() > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 at > org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:1= 2303) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 at > org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:6= 75) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 at > org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:648) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 at cassandraclient.Main.wri= teReadDelete(Main.java:101) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 at cassandraclient.Main.run= (Main.java:188) > > =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 at java.lang.Thread.run(Thr= ead.java:619) > > > > If I switch CL=3DONE the writes succeed, but I don=92t know if the data i= s being > replicated. Whatever the consistency level you use for a write, the data is always replicated unless some error occurs. The difference being whether the write waits to s= ee if an error occurs or not. -- Sylvain > > > > Any help would be greatly appreciated, thanks. > > > > Chris Jansen > > > NOTICE: Cognito Limited. Benham Valence, Newbury, Berkshire, RG20 8LU. UK= . > Company number 02723032. This e-mail message and any attachment is > confidential. It may not be disclosed to or used by anyone other than the > intended recipient. If you have received this e-mail in error please noti= fy > the sender immediately then delete it from your system. Whilst every effo= rt > has been made to check this mail is virus free we accept no responsibilit= y > for software viruses and you should check for viruses before opening any > attachments. Opinions, conclusions and other information in this email an= d > any attachments which do not relate to the official business of the compa= ny > are neither given by the company nor endorsed by it. > > This email message has been scanned for viruses by Mimecast