Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5027F10BC0 for ; Tue, 15 Oct 2013 05:16:19 +0000 (UTC) Received: (qmail 13688 invoked by uid 500); 15 Oct 2013 05:16:15 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 13662 invoked by uid 500); 15 Oct 2013 05:16:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 13650 invoked by uid 99); 15 Oct 2013 05:16:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Oct 2013 05:16:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ares.tang@gmail.com designates 209.85.223.170 as permitted sender) Received: from [209.85.223.170] (HELO mail-ie0-f170.google.com) (209.85.223.170) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Oct 2013 05:15:59 +0000 Received: by mail-ie0-f170.google.com with SMTP id x13so17442692ief.29 for ; Mon, 14 Oct 2013 22:15:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Vue7kku53WzhBxkHH2x3/nYC8BSEfNx+W9TckCIMSMk=; b=Kx1oE18CwQg4qGYVSLI+B6j60/a0qujA2K6W0q49KYXISok2zbsuRC5mAVl6eqiwcG 3khKtOB9YI2FWdnuprDqsohqAsCkl5wvtM/DgMu9ptn7Mc6QOGXIAU4/dHo28U8hipq5 Y1Jh5eV0SGsqU2RJ2s4BdEXXyW5LoXt3tuEk2UjUgphhRWaYA25z3Gy6o57KUeg7RCzR LOmKILwkWK8U0MYoZuKpTSJiA4vAXBHlPnMm8ZBF8puuG6uU9SJmOgXsKRB1glXvFmWy E1BeQUEZLn/hAG/U43i9lSWQf7GwOFSaxvRw9OXWILctil2/7y3O2NDUTrPdSf+PVpxl vJZQ== MIME-Version: 1.0 X-Received: by 10.50.49.65 with SMTP id s1mr15558176ign.43.1381814138214; Mon, 14 Oct 2013 22:15:38 -0700 (PDT) Received: by 10.50.74.163 with HTTP; Mon, 14 Oct 2013 22:15:38 -0700 (PDT) In-Reply-To: References: Date: Tue, 15 Oct 2013 13:15:38 +0800 Message-ID: Subject: Re: Side effects of hinted handoff lead to consistency problem From: Jason Tang To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=e89a8f5038e2da9dac04e8c0ac34 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f5038e2da9dac04e8c0ac34 Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable After check the log and configuration, I found it caused by two reason. 1. GC grace seconds I using hector client to connect cassandra, and the default value of GC grace seconds for each column family is **Zero** ! So when hinted handoff replay the temporary value, the tombstone on other two node is deleted by compaction. And then client will get the temporary value. 2. Secondary index Even after fix the first problem, I can still get temporary result from cassandra client. And I use the command like "get my_cf where column_one=3D'value' " to query the data, then the temporary value show again. But when I using the raw key to query the record again, it disappeared. And from client, we always using row key to get the data, and in this way, I didn't get the temporary value. So it seems the secondary index is not restricted by the consistency configuration. And when I change GC grace seconds to 10 days. our problem solved, but it is still a strange behavior when using index query. 2013/10/8 Jason Tang > I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level i= s > Write quorum, Read quorum. > Traffic has three major steps > Create: > Rowkey: xxxx > Column: status=3Dnew, requests=3D"xxxxx" > Update: > Rowkey: xxxx > Column: status=3Dexecuting, requests=3D"xxxxx" > Delete: > Rowkey: xxxx > > When one node down, it can work according to consistency configuration, > and the final status is all requests are finished and delete. > > So if running cassandra client to list the result (also set consistency > quorum). It shows empty (only rowkey left)=A3=AC which is correct. > > But if we start the dead node, the hinted handoff model will write back > the data to this node. So there are lots of create, update, delete. > > I don't know due to GC or compaction, the delete records on other two > nodes seems not work, and if using cassandra client to list the data (als= o > consistency quorum), the deleted row show again with column value. > > And if using client to check the data several times, you can find the dat= a > is changed, seems hinted handoff replay operation, the deleted data show = up > and then disappear. > > So the hinted handoff mechanism will faster the repair, but the temporary > data will be seen from external (if data is deleted). > > Is there a way to have this procedure invisible from external, until the > hinted handoff finished? > > What I want is final status synchronization, the temporary status is out > of date and also incorrect, should never been seen from external. > > Is it due to row delete instead of column delete? Or compaction? > --e89a8f5038e2da9dac04e8c0ac34 Content-Type: text/html; charset=GB2312 Content-Transfer-Encoding: quoted-printable
After check the log and configuration, I found it cau= sed by two reason.

 1. GC grace seconds
=
    I using hector client to connect cassandra, and the defa= ult value of GC grace seconds for each column family is **Zero** ! So when = hinted handoff replay the temporary value, the tombstone on other two node = is deleted by compaction. And then client will get the temporary value.

 2. Secondary index
    Even a= fter fix the first problem, I can still get temporary result from cassandra= client. And I use the command like "get my_cf where column_one=3D'= ;value' " to query the data, then the temporary value show again. = But when I using the raw key to query the record again, it disappeared.
    And from client, we always using row key to get the data= , and in this way, I didn't get the temporary value.

    So it seems the secondary index is not restricted by t= he consistency configuration.

    And when I change GC grace seconds to 10 = days. our problem solved, but it is still a strange behavior when using ind= ex query.


2013/10/8 Jason Tang <ares.tang@gmail.com>
I have a 3 nodes cluster, r= eplicate_factor is 3 also. Consistency level is Write quorum, Read quo= rum.
Traffic has three major steps
Create:
    Row= key: xxxx
    Column: status=3Dnew, requests=3D"xx= xxx"
Update:
     Rowkey: xxxx
  &n= bsp;  Column: status=3Dexecuting, requests=3D"xxxxx"
Delete:
     Rowkey: xxxx

<= div>When one node down, it can work according to consistency configuration,= and the final status is all requests are finished and delete.

So if running cassandra client to list the result (also= set consistency quorum). It shows empty (only rowkey left)=A3=AC which is = correct.

But if we start the dead node, the hinted= handoff model will write back the data to this node. So there are lots of = create, update, delete.

I don't know due to GC or compaction, the delete re= cords on other two nodes seems not work, and if using cassandra client to l= ist the data (also consistency quorum), the deleted row show again with col= umn value.

And if using client to check the data several times, yo= u can find the data is changed, seems hinted handoff replay operation, the = deleted data show up and then disappear.

So the hi= nted handoff mechanism will faster the repair, but the temporary data = will be seen from external (if data is deleted).

Is there a way to have this procedure invisible from ex= ternal, until the hinted handoff finished?

What I = want is final status synchronization, the temporary status is out of date a= nd also incorrect, should never been seen from external.

Is it due to row delete instead of column delete? Or co= mpaction?

--e89a8f5038e2da9dac04e8c0ac34--