Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 59805 invoked from network); 9 Dec 2010 16:47:10 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Dec 2010 16:47:10 -0000 Received: (qmail 34523 invoked by uid 500); 9 Dec 2010 16:47:08 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34456 invoked by uid 500); 9 Dec 2010 16:47:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34443 invoked by uid 99); 9 Dec 2010 16:47:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 16:47:06 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.215.171] (HELO mail-ey0-f171.google.com) (209.85.215.171) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Dec 2010 16:46:59 +0000 Received: by eyg5 with SMTP id 5so1843167eyg.30 for ; Thu, 09 Dec 2010 08:46:39 -0800 (PST) MIME-Version: 1.0 Received: by 10.14.22.8 with SMTP id s8mr1641120ees.17.1291913199169; Thu, 09 Dec 2010 08:46:39 -0800 (PST) Received: by 10.14.127.5 with HTTP; Thu, 9 Dec 2010 08:46:39 -0800 (PST) X-Originating-IP: [80.179.102.198] In-Reply-To: <3E66DF50-DBF6-4D06-B0BF-67A2209445B8@toptarif.de> References: <46C412D5-2781-4682-9F9E-B8D5C1D60664@toptarif.de> <88F8E565-C3E6-4F07-BB6C-360582A9AB15@toptarif.de> <3E66DF50-DBF6-4D06-B0BF-67A2209445B8@toptarif.de> Date: Thu, 9 Dec 2010 18:46:39 +0200 Message-ID: Subject: Re: Quorum: killing 1 out of 3 server kills the cluster (?) From: David Boxenhorn To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba53a1e25133200496fcfb76 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba53a1e25133200496fcfb76 Content-Type: text/plain; charset=ISO-8859-1 If that is what you want, use CL=ONE On Thu, Dec 9, 2010 at 6:43 PM, Timo Nentwig wrote: > > On Dec 9, 2010, at 17:39, David Boxenhorn wrote: > > > In other words, if you want to use QUORUM, you need to set RF>=3. > > > > (I know because I had exactly the same problem.) > > I naively assume that if I kill either node that holds N1 (i.e. node 1 or > 3), N1 will still remain on another node. Only if both fail, I actually lose > data. But apparently this is not how it works... > > > On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne > wrote: > > I'ts 2 out of the number of replicas, not the number of nodes. At RF=2, > you have > > 2 replicas. And since quorum is also 2 with that replication factor, > > you cannot lose > > a node, otherwise some query will end up as UnavailableException. > > > > Again, this is not related to the total number of nodes. Even with 200 > > nodes, if > > you use RF=2, you will have some query that fail (altough much less that > what > > you are probably seeing). > > > > On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig > wrote: > > > > > > On Dec 9, 2010, at 16:50, Daniel Lundin wrote: > > > > > >> Quorum is really only useful when RF > 2, since the for a quorum to > > >> succeed RF/2+1 replicas must be available. > > > > > > 2/2+1==2 and I killed 1 of 3, so... don't get it. > > > > > >> This means for RF = 2, consistency levels QUORUM and ALL yield the > same result. > > >> > > >> /d > > >> > > >> On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig < > timo.nentwig@toptarif.de> wrote: > > >>> Hi! > > >>> > > >>> I've 3 servers running (0.7rc1) with a replication_factor of 2 and > use quorum for writes. But when I shut down one of them > UnavailableExceptions are thrown. Why is that? Isn't that the sense of > quorum and a fault-tolerant DB that it continues with the remaining 2 nodes > and redistributes the data to the broken one as soons as its up again? > > >>> > > >>> What may I be doing wrong? > > >>> > > >>> thx > > >>> tcn > > > > > > > > > > --90e6ba53a1e25133200496fcfb76 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
If that is what you want, use CL=3DONE

On Thu, Dec 9, 2010 at 6:43 PM, Timo Nentwig <timo.nentwig@toptarif.= de> wrote:

On Dec 9, 2010, at 17:39, David Boxenhorn wrote:

> In other words, if you want to use QUORUM, you need to set RF>=3D3.=
>
> (I know because I had exactly the same problem.)

I naively assume that if I kill either node that holds N1 (i.e. node = 1 or 3), N1 will still remain on another node. Only if both fail, I actuall= y lose data. But apparently this is not how it works...

> On Thu, Dec 9, 2010 at 6:05 PM, Sylvain Lebresne <sylvain@yakaz.com> wrote:
> I'ts 2 out of the number of replicas, not the number of nodes. At = RF=3D2, you have
> 2 replicas. And since quorum is also 2 with that replication factor, > you cannot lose
> a node, otherwise some query will end up as UnavailableException.
>
> Again, this is not related to the total number of nodes. Even with 200=
> nodes, if
> you use RF=3D2, you will have some query that fail (altough much less = that what
> you are probably seeing).
>
> On Thu, Dec 9, 2010 at 5:00 PM, Timo Nentwig <timo.nentwig@toptarif.de> wrote:
> >
> > On Dec 9, 2010, at 16:50, Daniel Lundin wrote:
> >
> >> Quorum is really only useful when RF > 2, since the for a = quorum to
> >> succeed RF/2+1 replicas must be available.
> >
> > 2/2+1=3D=3D2 and I killed 1 of 3, so... don't get it.
> >
> >> This means for RF =3D 2, consistency levels QUORUM and ALL yi= eld the same result.
> >>
> >> /d
> >>
> >> On Thu, Dec 9, 2010 at 4:40 PM, Timo Nentwig <timo.nentwig@toptarif.de> wrote:
> >>> Hi!
> >>>
> >>> I've 3 servers running (0.7rc1) with a replication_fa= ctor of 2 and use quorum for writes. But when I shut down one of them Unava= ilableExceptions are thrown. Why is that? Isn't that the sense of quoru= m and a fault-tolerant DB that it continues with the remaining 2 nodes and = redistributes the data to the broken one as soons as its up again?
> >>>
> >>> What may I be doing wrong?
> >>>
> >>> thx
> >>> tcn
> >
> >
>


--90e6ba53a1e25133200496fcfb76--