Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of ares.tang@gmail.com
 designates 209.85.223.170 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAFb+LUwN1R_PQgUTkLYScjtBU8kNTjHvZNxO1QYRtWLp-G4jPA@mail.gmail.com>
References: 
 <CAFb+LUwN1R_PQgUTkLYScjtBU8kNTjHvZNxO1QYRtWLp-G4jPA@mail.gmail.com>
Date: Tue, 15 Oct 2013 13:15:38 +0800
Message-ID: 
 <CAFb+LUwi4VDCWLz2SYoXPmnBRdb-5ob1ckHWsgzz8aOiukRk6w@mail.gmail.com>
Subject: Re: Side effects of hinted handoff lead to consistency problem
From: Jason Tang <ares.tang@gmail.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=e89a8f5038e2da9dac04e8c0ac34

--e89a8f5038e2da9dac04e8c0ac34
Content-Type: text/plain; charset=GB2312
Content-Transfer-Encoding: quoted-printable

After check the log and configuration, I found it caused by two reason.

 1. GC grace seconds
    I using hector client to connect cassandra, and the default value of GC
grace seconds for each column family is **Zero** ! So when hinted handoff
replay the temporary value, the tombstone on other two node is deleted by
compaction. And then client will get the temporary value.

 2. Secondary index
    Even after fix the first problem, I can still get temporary result from
cassandra client. And I use the command like "get my_cf where
column_one=3D'value' " to query the data, then the temporary value show
again. But when I using the raw key to query the record again, it
disappeared.
    And from client, we always using row key to get the data, and in this
way, I didn't get the temporary value.

    So it seems the secondary index is not restricted by the consistency
configuration.

    And when I change GC grace seconds to 10 days. our problem solved, but
it is still a strange behavior when using index query.


2013/10/8 Jason Tang <ares.tang@gmail.com>

> I have a 3 nodes cluster, replicate_factor is 3 also. Consistency level i=
s
> Write quorum, Read quorum.
> Traffic has three major steps
> Create:
>     Rowkey: xxxx
>     Column: status=3Dnew, requests=3D"xxxxx"
> Update:
>      Rowkey: xxxx
>      Column: status=3Dexecuting, requests=3D"xxxxx"
> Delete:
>      Rowkey: xxxx
>
> When one node down, it can work according to consistency configuration,
> and the final status is all requests are finished and delete.
>
> So if running cassandra client to list the result (also set consistency
> quorum). It shows empty (only rowkey left)=A3=AC which is correct.
>
> But if we start the dead node, the hinted handoff model will write back
> the data to this node. So there are lots of create, update, delete.
>
> I don't know due to GC or compaction, the delete records on other two
> nodes seems not work, and if using cassandra client to list the data (als=
o
> consistency quorum), the deleted row show again with column value.
>
> And if using client to check the data several times, you can find the dat=
a
> is changed, seems hinted handoff replay operation, the deleted data show =
up
> and then disappear.
>
> So the hinted handoff mechanism will faster the repair, but the temporary
> data will be seen from external (if data is deleted).
>
> Is there a way to have this procedure invisible from external, until the
> hinted handoff finished?
>
> What I want is final status synchronization, the temporary status is out
> of date and also incorrect, should never been seen from external.
>
> Is it due to row delete instead of column delete? Or compaction?
>

--e89a8f5038e2da9dac04e8c0ac34
Content-Type: text/html; charset=GB2312
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>After check the log and configuration, I found it cau=
sed by two reason.</div><div><br></div><div>&nbsp;1. GC grace seconds</div>=
<div>&nbsp; &nbsp; I using hector client to connect cassandra, and the defa=
ult value of GC grace seconds for each column family is **Zero** ! So when =
hinted handoff replay the temporary value, the tombstone on other two node =
is deleted by compaction. And then client will get the temporary value.</di=
v>
<div><br></div><div>&nbsp;2. Secondary index</div><div>&nbsp; &nbsp; Even a=
fter fix the first problem, I can still get temporary result from cassandra=
 client. And I use the command like &quot;get my_cf where column_one=3D&#39=
;value&#39; &quot; to query the data, then the temporary value show again. =
But when I using the raw key to query the record again, it disappeared.</di=
v>
<div>&nbsp; &nbsp; And from client, we always using row key to get the data=
, and in this way, I didn&#39;t get the temporary value.</div><div><br></di=
v><div>&nbsp; &nbsp; So it seems the secondary index is not restricted by t=
he consistency configuration.</div>
<div><br></div><div>&nbsp; &nbsp; And when I change GC grace seconds to 10 =
days. our problem solved, but it is still a strange behavior when using ind=
ex query.</div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail=
_quote">2013/10/8 Jason Tang <span dir=3D"ltr">&lt;<a href=3D"mailto:ares.t=
ang@gmail.com" target=3D"_blank">ares.tang@gmail.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">I have a 3 nodes cluster, r=
eplicate_factor is 3 also. Consistency level is Write quorum, Read&nbsp;quo=
rum.<div>
Traffic has three major steps</div><div>Create:</div><div>&nbsp; &nbsp; Row=
key: xxxx</div><div>&nbsp; &nbsp; Column: status=3Dnew, requests=3D&quot;xx=
xxx&quot;</div>
<div>Update:</div><div>&nbsp; &nbsp; &nbsp;Rowkey: xxxx</div><div>&nbsp; &n=
bsp; &nbsp;Column: status=3Dexecuting, requests=3D&quot;xxxxx&quot;</div><d=
iv>Delete:</div><div>&nbsp; &nbsp; &nbsp;Rowkey: xxxx</div><div><br></div><=
div>When one node down, it can work according to consistency configuration,=
 and the final status is all requests are finished and delete.</div>

<div><br></div><div>So if running cassandra client to list the result (also=
 set consistency quorum). It shows empty (only rowkey left)=A3=AC which is =
correct.</div><div><br></div><div>But if we start the dead node, the hinted=
 handoff model will write back the data to this node. So there are lots of =
create, update, delete.</div>

<div><br></div><div>I don&#39;t know due to GC or compaction, the delete re=
cords on other two nodes seems not work, and if using cassandra client to l=
ist the data (also consistency quorum), the deleted row show again with col=
umn value.</div>

<div><br></div><div>And if using client to check the data several times, yo=
u can find the data is changed, seems hinted handoff replay operation, the =
deleted data show up and then disappear.</div><div><br></div><div>So the hi=
nted handoff mechanism will faster the repair, but the&nbsp;temporary data =
will be seen from external (if data is deleted).</div>

<div><br></div><div>Is there a way to have this procedure invisible from ex=
ternal, until the hinted handoff finished?</div><div><br></div><div>What I =
want is final status synchronization, the temporary status is out of date a=
nd also incorrect, should never been seen from external.</div>

<div><br></div><div>Is it due to row delete instead of column delete? Or co=
mpaction?</div></div>
</blockquote></div><br></div>

--e89a8f5038e2da9dac04e8c0ac34--