Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CDC78D4C8 for ; Tue, 5 Mar 2013 00:30:32 +0000 (UTC) Received: (qmail 8621 invoked by uid 500); 5 Mar 2013 00:30:30 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 8567 invoked by uid 500); 5 Mar 2013 00:30:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 8558 invoked by uid 99); 5 Mar 2013 00:30:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2013 00:30:30 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ares.tang@gmail.com designates 209.85.210.177 as permitted sender) Received: from [209.85.210.177] (HELO mail-ia0-f177.google.com) (209.85.210.177) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Mar 2013 00:30:25 +0000 Received: by mail-ia0-f177.google.com with SMTP id y25so211158iay.22 for ; Mon, 04 Mar 2013 16:30:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=koAJiQ+73aRSRk448UMF8RNUp1AofcVpYsCLqY4LSDo=; b=tTYMYHdwHnXnUqURR1muaybnpRxHOUS1oyjb1j4yOvoWKQ2AtD/WtsYJNIRxib95O4 KC6DcJD/qNbLv94sQ0QCs3PLAhhEC7H/oy7Fy0aw39hN4T28zL+gfr9rRl3t9rISg5QI Dc82wiJJul28Vfqx+mG13GGl6+j/+HaKTViNfxf+2AsleYl0uHxw/sdHqBA/NWalFxy8 a/lufSEh17RSBBO2fe/RYt/aqh6dsTc23JWMlqES0i6bJQvUjWxJnt/t/+Bn1aF+QHdK mZjUl+oZ3jXE3EsifcQSD/bZwWpdxsjQCUPttAm8KDB3yur1BzsvXdSOsZ0nRoiZu4dU 755g== MIME-Version: 1.0 X-Received: by 10.50.186.227 with SMTP id fn3mr4289786igc.17.1362443405309; Mon, 04 Mar 2013 16:30:05 -0800 (PST) Received: by 10.50.2.70 with HTTP; Mon, 4 Mar 2013 16:30:05 -0800 (PST) In-Reply-To: References: Date: Tue, 5 Mar 2013 08:30:05 +0800 Message-ID: Subject: Re: Consistent problem when solve Digest mismatch From: Jason Tang To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=14dae93408a5333f4f04d72293dd X-Virus-Checked: Checked by ClamAV on apache.org --14dae93408a5333f4f04d72293dd Content-Type: text/plain; charset=ISO-8859-1 Hi The timestamp provided by my client is unix timestamp (with ntp), and as I said, due to the ntp drift, the local unix timestamp is not accurately synchronized (compare to my case). So for short, client can not provide global sequence number to indicate the event order. But I wonder, I configured Cassandra consistency level as write QUORUM. So for one record, I suppose Cassandra has the ability to decide the final update results. Otherwise, it means the version conflict solving strong depends on global sequence id (timestamp) which need provide by client ? //Tang 2013/3/4 Sylvain Lebresne > The problem is, what is the sequence number you are talking about is > exactly? > > Or let me put it another way: if you do have a sequence number that > provides a total ordering of your operation, then that is exactly what you > should use as your timestamp. What Cassandra calls the timestamp, is > exactly what you call seqID, it's the number Cassandra uses to decide the > order of operation. > > Except that in real life, provided you have more than one client talking > to Cassandra, then providing a total ordering of operation is hard, and in > fact not doable efficiently. So in practice, people use unix timestamp > (with ntp) which provide a very good while cheap approximation of the real > life order of operations. > > But again, if you do know how to assign a more precise "timestamp", > Cassandra let you use that: you can provid your own timestamp (using unix > timestamp is just the default). The point being, unix timestamp is the > better approximation we have in practice. > > -- > Sylvain > > > On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang wrote: > >> Hi >> >> Previous I met a consistency problem, you can refer the link below for >> the whole story. >> >> http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3CCAFb+LUxna0jiY0V=AvXKzUdxSjApYm4zWk=Ka9LJM-txc04Gjw@mail.gmail.com%3E >> >> And after check the code, seems I found some clue of the problem. Maybe >> some one can check this. >> >> For short, I have Cassandra cluster (1.0.3), The consistency level is >> read/write quorum, replication_factor is 3. >> >> Here is event sequence: >> >> seqID NodeA NodeB NodeC >> 1. New New New >> 2. Update Update Update >> 3. Delete Delete >> >> When try to read from NodeB and NodeC, "Digest mismatch" exception >> triggered, so Cassandra try to resolve this version conflict. >> But the result is value "Update". >> >> Here is the suspect root cause, the version conflict resolved based >> on time stamp. >> >> Node C local time is a bit earlier then node A. >> >> "Update" requests sent from node C with time stamp 00:00:00.050, "Delete" >> sent from node A with time stamp 00:00:00.020, which is not same as the >> event sequence. >> >> So the version conflict resolved incorrectly. >> >> It is true? >> >> If Yes, then it means, consistency level can secure the conflict been >> found, but to solve it correctly, dependence one time synchronization's >> accuracy, e.g. NTP ? >> >> >> > --14dae93408a5333f4f04d72293dd Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi=A0

The timestamp provide= d by my client is unix timestamp (with ntp), and as I said, due to the ntp= =A0drift, the local unix timestamp is not=A0accurately synchronized (compar= e to my case).

So for short, client can not provide global= sequence number to indicate the event order.

But I wonder, I configured Cassandra consistency level as write=A0Q= UORUM. So for one record, I=A0suppose=A0Cassandra has the ability to decide= the final update results.

Otherwise, it means the version conflict so= lving strong depends on global sequence id (timestamp) which need provide b= y client ?


//Tang=


2013/3/= 4 Sylvain Lebresne <sylvain@datastax.com>
The problem is, what is the sequence number you are t= alking about is exactly?

Or let me put it another = way: if you do have a sequence number that provides a total ordering of you= r operation, then that is exactly what you should use as your timestamp. Wh= at Cassandra calls the timestamp, is exactly what you call seqID, it's = the number Cassandra uses to decide the order of operation.

Except that in real life, provided you have more than o= ne client talking to Cassandra, then providing a total ordering of operatio= n is hard, and in fact not doable efficiently. So in practice, people use u= nix timestamp (with ntp) which provide a very good while cheap approximatio= n of the real life order of operations.

But again, if you do know how to assign a more precise = "timestamp", Cassandra let you use that: you can provid your own = timestamp (using unix timestamp is just the default). The point being, unix= timestamp is the better approximation we have in practice.

--
Sylvain
<= div class=3D"h5">


On Mon, Mar 4, 2013 at 9:26 AM, Jason Tang <ares.tang@gmail.com&= gt; wrote:
Hi

=A0 P= revious I met a consistency problem, you can refer the link below for the w= hole story.

=A0 And after check the code, seems I found some = clue of the problem. Maybe some one can check this.

=A0 For short, I have Cassandra cluster (1.0.3),=A0The consistency level is read/write quorum, replication_factor=A0is 3.= =A0

=A0 Here is event sequence:

seqID =A0 NodeA =A0 NodeB =A0 NodeC
1. =A0 =A0 =A0 =A0 New =A0 =A0= =A0New =A0 =A0 =A0 New
2. =A0 = =A0 =A0 =A0 Update =A0Update =A0 Update
3. =A0 =A0 =A0 =A0 Delete =A0 D= elete =A0 =A0

When try to read from NodeB and NodeC, "Digest mismatch" exception= triggered, so Cassandra try to resolve this version conflict.
But the = result is value "Update".

Here is t= he suspect root cause, the version conflict resolved based on=A0time stamp.=

Node C local time is a bit earlier then node A.

"Update" requests sent from node C with time stam= p 00:00:00.050, "Delete" sent from node A with time stamp 00:00:0= 0.020, which is not same as the event sequence.

So the version conflict resolved incorrectly.<= /font>

=
It is true?

If Yes, then it means, consistency level can secure the con= flict been found, but to solve it correctly, dependence one time synchroniz= ation's accuracy, e.g. NTP ?

=



--14dae93408a5333f4f04d72293dd--