Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 33501 invoked from network); 28 Apr 2010 15:25:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Apr 2010 15:25:02 -0000 Received: (qmail 95792 invoked by uid 500); 28 Apr 2010 15:25:01 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 95735 invoked by uid 500); 28 Apr 2010 15:25:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 95727 invoked by uid 99); 28 Apr 2010 15:25:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 15:25:01 +0000 X-ASF-Spam-Status: No, hits=4.1 required=10.0 tests=AWL,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of roland237@googlemail.com designates 209.85.219.225 as permitted sender) Received: from [209.85.219.225] (HELO mail-ew0-f225.google.com) (209.85.219.225) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 15:24:57 +0000 Received: by ewy25 with SMTP id 25so5366709ewy.27 for ; Wed, 28 Apr 2010 08:24:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=wHTEqQNNbGMyrP2C6U62FeNcLZRraB+qzGEd9VaESXo=; b=bDFCXrCMelavDt82DPcwAsX9V8Ssv4W1xJn3h8dxs44ILsTMh4F1yeiIqI5wnRJlLm EId/0UMs8JG/ydZ4oUFXkbI+dJzS/kpOqEQPwZTxKpl0fLd7xGBgHLszEjUheLlr3HZW ltftrbIPp8FbuJxGwCfLFfdfaXhEBfYPLSN3I= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=gB6tSHYB0PJ+Zr+4lUXRopGQUJ1ussYMNhA/swb8xEzrrwwuOD7s3lG6YuWA4ODk32 aZnYXXVZWMU30fthm9D0BFe2c25HVdd+4DiDZL/Y2d8nrSyVzOjsacuNzFSCoiInUrsn alLS312MvwUL5WJYrzix5iJhB+GCnH1JJsMGg= MIME-Version: 1.0 Received: by 10.102.170.9 with SMTP id s9mr4247334mue.77.1272468275442; Wed, 28 Apr 2010 08:24:35 -0700 (PDT) Sender: roland237@googlemail.com Received: by 10.103.231.7 with HTTP; Wed, 28 Apr 2010 08:24:35 -0700 (PDT) In-Reply-To: References: Date: Wed, 28 Apr 2010 17:24:35 +0200 X-Google-Sender-Auth: 74d776e7fb85e108 Message-ID: Subject: Re: Detailed behavior of insert() operation? From: =?ISO-8859-1?Q?Roland_H=E4nel?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016364c77498bb43904854d9b50 --0016364c77498bb43904854d9b50 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Jonathan, that hits exactly the heart of my question. Unfortunately it kills my original idea to implement a "unique transaction identifier creation algorithm" - for this, even eventual consistency would be sufficient, but I would need to know if I am consistent at the time of a read request. One last question (sorry to bother you): isn't the behavior of read repair strictly deterministic in this case? You say both read requests could try t= o read repair the result (each time in the opposite direction). Inside the read repair algorithm, when we have exactly the same timestamps, what value is elected for repair? The first one that the node got in the read request? If we make that deterministic, we could avoid this scenario, right? -Roland 2010/4/28 Jonathan Ellis > 2010/4/28 Roland H=E4nel : > > Two clients insert the same key/colum with different values at the same > > time: > > > > client A does insert(keyspace, key_1, > > column_name_1, value_A, timestamp_1, consistency_level.QUORUM) > > client B does insert(keyspace, key_1, > > column_name_1, value_B, timestamp_1, consistency_level.QUORUM) > > > > After that, both clients read their value: > > > > client A does > > get(keyspace, key_1, column_name_1, consistency_level.QUORUM) > > client B does > > get(keyspace, key_1, column_name_1, consistency_level.QUORUM) > > > > It is obvious that since the insert happens 'at the same time', i.e. wi= th > > the same timestamp, we cannot say > > which value (value_A or value_B) gets written to the row. However, do w= e > > have a guarantee that either value_A > > or value_B is written, and that both read operations will return the sa= me > > result? > > The guarantee is that "eventually" you will get a consistent result. > > Say both writes overlap such that value A is present on replicas R1 > and R2, and value B is present on replica R3 (after both writes > complete). > > Simultaneous read operations could then both attempt to "repair" the > other nodes, and again there could be overlap, resulting in still 2 > values present, possibly on different nodes this time. > > So: you can see different values on reads when there are two > "simultaneous" writes, and this can continue in the worst-case > scenario until one read's repair can finish before another begins. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > --0016364c77498bb43904854d9b50 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks Jonathan, that hits exactly the heart of my question. Unfortunately = it kills my original idea to implement a "unique transaction identifie= r creation algorithm" - for this, even eventual consistency would be s= ufficient, but I would need to know if I am consistent at the time of a rea= d request.

One last question (sorry to bother you): isn't the behavior of read= repair strictly deterministic in this case? You say both read requests cou= ld try to read repair the result (each time in the opposite direction). Ins= ide the read repair algorithm, when we have exactly the same timestamps, wh= at value is elected for repair? The first one that the node got in the read= request? If we make that deterministic, we could avoid this scenario, righ= t?

-Roland



2010/4/28 Jonathan El= lis <jbellis@gmai= l.com>
2010/4/28 Roland H=E4nel <roland@hae= nel.me>:
> Two clients insert the same key/colum with different= values at the same
> time:
>
> =A0=A0 client A does insert(keyspace,=A0key_1,
> column_name_1,=A0value_A,=A0timestamp_1,=A0consistency_level.QUORUM) > =A0=A0 client B does insert(keyspace,=A0key_1,
> column_name_1,=A0value_B,=A0timestamp_1,=A0consistency_level.QUORUM) >
> After that, both clients read their value:
>
> =A0=A0 client A does
> get(keyspace,=A0key_1,=A0column_name_1,=A0consistency_level.QUORUM) > =A0=A0 client B does
> get(keyspace,=A0key_1,=A0column_name_1,=A0consistency_level.QUORUM) >
> It is obvious that since the insert happens 'at the same time'= , i.e. with
> the same timestamp, we cannot say
> which value (value_A or value_B) gets written to the row. However, do = we
> have a guarantee that either value_A
> or value_B is written, and that both read operations will return the s= ame
> result?

The guarantee is that "eventually" you will get a consisten= t result.

Say both writes overlap such that value A is present on replicas R1
and R2, and value B is present on replica R3 (after both writes
complete).

Simultaneous read operations could then both attempt to "repair" = the
other nodes, and again there could be overlap, resulting in still 2
values present, possibly on different nodes this time.

So: you can see different values on reads when there are two
"simultaneous" writes, and this can continue in the worst-case scenario until one read's repair can finish before another begins.

--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

--0016364c77498bb43904854d9b50--