Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 53688 invoked from network); 22 Nov 2010 13:03:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Nov 2010 13:03:41 -0000 Received: (qmail 2700 invoked by uid 500); 22 Nov 2010 13:04:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 2614 invoked by uid 500); 22 Nov 2010 13:04:09 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 2606 invoked by uid 99); 22 Nov 2010 13:04:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 13:04:09 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nick.telford@tweetmeme.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ey0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 13:04:04 +0000 Received: by eyd10 with SMTP id 10so3889062eyd.31 for ; Mon, 22 Nov 2010 05:03:42 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.54.147 with SMTP id i19mr4009623wec.59.1290431011084; Mon, 22 Nov 2010 05:03:31 -0800 (PST) Sender: nick.telford@tweetmeme.com Received: by 10.216.232.99 with HTTP; Mon, 22 Nov 2010 05:03:31 -0800 (PST) In-Reply-To: References: Date: Mon, 22 Nov 2010 13:03:31 +0000 X-Google-Sender-Auth: 5FldcmbLF3Ba4_zfWpBIjLHfbjU Message-ID: Subject: Re: Facebook messaging and choice of HBase over Cassandra - what can we learn? From: Nick Telford To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016e6db2f2d05dc100495a3e239 --0016e6db2f2d05dc100495a3e239 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Provided at least one node receives the write, it will eventually be writte= n to all replicas. A failure to meet the requested ConsistencyLevel is just that; not a failure to write the data itself. Once the write is received by a node, it will eventually reach all replicas, there is no roll back. This is the source of a fair bit of confusion, as most people are used to the binary behaviour of "success or failure". It's important that clients are able to distinguish between a failure for a write to reach the cluster and a failure to meet the requested ConsistencyLevel in order to provide Durability guarantees for application data. On 22 November 2010 12:31, David Boxenhorn wrote: > Yes, but the value is supposed to be 11, since the write failed. > > On Mon, Nov 22, 2010 at 2:27 PM, Andr=E9 Fiedler < > fiedler.andre@googlemail.com> wrote: > >> Doesn=B4t sync Cassandra all nodes if the network is up again? I think t= his >> was one of the reasons, storing a timestamp at every key/value pair? >> So i think the response will only temporary be 11. If all nodes have syn= ct >> it should be 12? Or isn=B4t that so? >> >> greetings Andr=E9 >> >> 2010/11/22 Samuel Carri=E8re >> >> >Cassandra can work in a consistent way, see some of this discussion and >>> the Consistency section here >>> http://wiki.apache.org/cassandra/ArchitectureOverview >>> > >>> >If you always read and write with CL.Quorum (or the other way discusse= d) >>> you will have consistency. Even if some of the replicas are temporarily >>> inconsistent, or off line or whatever. Your reads will >be consistent, = i.e. >>> every client will get the same value or the read will not work. If you = want >>> to work at a lower or higher consistency you can. >>> > >>> >Eventually all replicas of a value will become consistent. >>> > >>> >There are a number of reasons why cassandra may not be a good fit, and= I >>> would guess something else would be a problem before the consistency mo= del. >>> > >>> >Hope that helps. >>> >Aaron >>> >>> Hello, >>> >>> I like cassandra a lot and I'm sure it can be used in many use cases, >>> but I'm not sure we can say that we have strong consistency, >>> even if we read and write with CL.Quorum. >>> >>> Firstly, we can only expect consistency at the column level. Reading >>> and writing with CL.Quorum gives you most of the time >>> a consistent value for each individual column, but it does not mean if >>> gives you a consistent view of your data. >>> (Because cassandra gives you no isolation and no transactions, your >>> application has to deal with data inconsistencies). >>> >>> Secondly, I may be wrong, but I'm not sure consistency at the column >>> level is guaranteed. Here is an example, with a replication >>> factor of 3. >>> Imagine that the current value of col1 is 11. Your application tries >>> to write "col1 =3D 12" with CL.Quorum. >>> Imagine the write arrives to node 1, but that the new value is not >>> transmitted to nodes 2 and 3 because of network failures. So >>> the write fails (this is the expected behaviour), but node 1 still has >>> the new value (there is no rollback). >>> >>> Then, imagine that the network is back to normal, and that another >>> client asked for the value of col1, with CL.Quorum. Here, >>> the value of the response is not guaranteed. If the client asks for >>> the value to node 2 and node 3, the response will be 11, but >>> if he asks to node 1 and node 2 or 3, the response will be 12. >>> >>> Am I missing something ? >>> >>> Samuel >>> >> >> > --0016e6db2f2d05dc100495a3e239 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Provided at least one node receives the write, it will eventually be writte= n to all replicas. A failure to meet the requested ConsistencyLevel is just= that; not a failure to write the data itself. Once the write is received b= y a node, it will eventually reach all replicas, there is no roll back.
This is the source of a fair bit of confusion, as most peopl= e are used to the binary behaviour of "success or failure". It= 9;s important that clients are able to distinguish between a failure for a = write to reach the cluster and a failure to meet the requested ConsistencyL= evel in order to provide Durability guarantees for application data.

On 22 November 2010 12:31, David Boxenh= orn <david@lookin= 2.com> wrote:
Yes, but the value is supposed to be 11, since the write f= ailed.

On Mon, Nov 22, 2010 at 2:27 PM, Andr=E9 Fiedler <fiedler.andr= e@googlemail.com> wrote:
Doesn=B4t sync Cassandr= a all nodes if the network is up again? I think this was one of the reasons= , storing a timestamp at every key/value pair?
So i think the response will only temporary be 11. If all nodes have synct = it should be 12? Or isn=B4t that so?

greetings Andr=E9

201= 0/11/22 Samuel Carri=E8re <samuel.carriere@gmail.com>

>Cassandra can work in a consistent way, see some of this discussio= n and the Consistency section here http://wiki.apache.org/cassandr= a/ArchitectureOverview
>
>If you always read and write with CL.Quorum (or the other way discussed= ) you will have consistency. Even if some of the replicas are temporarily i= nconsistent, or off line or whatever. Your reads will >be consistent, i.= e. every client will get the same value or the read will not work. If you w= ant to work at a lower or higher consistency you can.
>
>Eventually all replicas of a value will become consistent.
>
>There are a number of reasons why cassandra may not be a good fit, and = I would guess something else would be a problem before the consistency mode= l.
>
>Hope that helps.
>Aaron

Hello,

I like cassandra a lot and I'm sure it can be used in many use cases, but I'm not sure we can say that we have strong consistency,
even if we read and write with CL.Quorum.

Firstly, we can only expect consistency at the column level. Reading
and writing with CL.Quorum gives you most of the time
a consistent value for each individual column, but it does not mean if
gives you a consistent view of your data.
(Because cassandra gives you no isolation and no transactions, your
application has to deal with data inconsistencies).

Secondly, I may be wrong, but I'm not sure consistency at the column level is guaranteed. Here is an example, with a replication
factor of 3.
Imagine that the current value of col1 is 11. Your application tries
to write "col1 =3D 12" with CL.Quorum.
Imagine the write arrives to node 1, but that the new value is not
transmitted to nodes 2 and 3 because of network failures. So
the write fails (this is the expected behaviour), but node 1 still has
the new value (there is no rollback).

Then, imagine that the network is back to normal, and that another
client asked for the value of col1, with CL.Quorum. Here,
the value of the response is not guaranteed. If the client asks for
the value to node 2 and node 3, the response will be 11, but
if he asks to node 1 and node 2 or 3, the response will be 12.

Am I missing something ?

Samuel



--0016e6db2f2d05dc100495a3e239--