Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: error (athena.apache.org: local policy)
MIME-Version: 1.0
Reply-To: mail@frensjan.nl
Sender: frensjan@frensjan.nl
In-Reply-To: 
 <CABNXB2D=ty1k7f41pVjc=85LAukPJNWsGb5qgir+zt_8X6=RQA@mail.gmail.com>
References: <1425492417697.a19cfa28@Nodemailer>
	<1425495806878.c9686778@Nodemailer>
	<CAOUOv0FLecMuuxpjYrUcTynfF=+4HiHXh7xDjqoKh478RSMGoA@mail.gmail.com>
	<CAH3f1B99_v8DgoWPpoSBeZwk6tMory77UciURwOsXT-6iZcj=w@mail.gmail.com>
	<CABNXB2D=ty1k7f41pVjc=85LAukPJNWsGb5qgir+zt_8X6=RQA@mail.gmail.com>
Date: Wed, 11 Mar 2015 01:57:43 +0100
Message-ID: 
 <CAH3f1B9SmET0jJEAC_u-6tFZ_Bsr4cb=OnKidR1taUFNJPbpCQ@mail.gmail.com>
Subject: Re: Inconsistent count(*) and distinct results from Cassandra
From: "Rumph, Frens Jan" <mail@frensjan.nl>
To: DuyHai Doan <doanduyhai@gmail.com>
Cc: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e016338064585ad0510f8c1e7

--089e016338064585ad0510f8c1e7
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thanks for the suggestion DuyHai. I assume you mean CL=3DQUORUM (as in
consistency level, not replication factor). As expected, setting the
consistency level to quorum or all yields equally inconsistent results for
the select count and select distinct queries.

Which is good in a way, because if RF=3D1 and CL=3DONE I would expect an er=
ror
if one of the nodes wouldn't be able to answer a query.

Note that there conceptually is no such thing as a quorum or majority when
RF=3D1. As quorum in C* is defined as floor( (RF / 2) + 1 ), in case of RF=
=3D1
this in practice is the same as CL=3DONE.

On 10 March 2015 at 18:10, DuyHai Doan <doanduyhai@gmail.com> wrote:

> First idea to eliminate any issue with regards to staled data: issue the
> same count query with RF=3DQUORUM and check whether there are still
> inconsistencies
>
> On Tue, Mar 10, 2015 at 9:13 AM, Rumph, Frens Jan <mail@frensjan.nl>
> wrote:
>
>> Hi Jens, Mikhail, Daemeon,
>>
>> Thanks for your replies. Sorry for my reply being late ... mails from th=
e
>> user-list were moved to the wrong inbox on my side.
>>
>> I'm in a development environment and thus using replication factor =3D 1
>> and consistency =3D ONE with three nodes. So the 'results from different
>> nodes between queries' hypothesis seems unlikely to me. I would expect a
>> timeout if some node wouldn't be able to answer.
>>
>> I tried tracing, but I couldn't really make any of it.
>>
>> For example I performed two select distinct ... from ... queries: Traces
>> for both of them contained more than one line like 'Submitting range
>> requests on ... ranges ...' and 'Submitted ... concurrent range requests
>> covering ... ranges'. These lines occur with varying numbers, e.g. :
>>
>> Submitting range requests on 593 ranges with a concurrency of 75 (1.35
>> rows per range expected)
>> Submitting range requests on 769 ranges with a concurrency of 75 (1.35
>> rows per range expected)
>>
>>
>> Also when looking at the lines like 'Executing seq scan across ...
>> sstables for ...' I saw that in one case which yielded way less partitio=
n
>> keys that only the tokens from -9223372036854770000  to -594461978511041=
000
>> were included. In a case which yielded much more partition keys, the ent=
ire
>> token range did seem to be queried.
>>
>> To reiterate my initial questions: is this behavior to be expected? Am I
>> doing something wrong? Is there a workaround?
>>
>> Best regards,
>> Frens Jan
>>
>> On 4 March 2015 at 22:59, daemeon reiydelle <daemeonr@gmail.com> wrote:
>>
>>> What is the replication? Could you be serving stale data from a node
>>> that was not properly replicated (hints timeout exceeded by a node bein=
g
>>> down?)
>>>
>>>
>>>
>>> On Wed, Mar 4, 2015 at 11:03 AM, Jens Rantil <jens.rantil@tink.se>
>>> wrote:
>>>
>>>> Frens,
>>>>
>>>> What consistency are you querying with? Could be you are simply
>>>> receiving result from different nodes each time.
>>>>
>>>> Jens
>>>>
>>>> =E2=80=93
>>>> Skickat fr=C3=A5n Mailbox <https://www.dropbox.com/mailbox>
>>>>
>>>>
>>>> On Wed, Mar 4, 2015 at 7:08 PM, Mikhail Strebkov <strebkov@gmail.com>
>>>> wrote:
>>>>
>>>>> We have observed the same issue in our production Cassandra cluster (=
5
>>>>> nodes in one DC). We use Cassandra 2.1.3 (I joined the list too late =
to
>>>>> realize we shouldn=E2=80=99t user 2.1.x yet) on Amazon machines (crea=
ted from
>>>>> community AMI).
>>>>>
>>>>> In addition to count variations with 5 to 10% we observe variations
>>>>> for the query =E2=80=9Cselect * from table1 where time > '$fromDate' =
and time <
>>>>> '$toDate' allow filtering=E2=80=9D results. We iterated through the r=
esults
>>>>> multiple times using official Java driver. We used that query for a h=
uge
>>>>> data migration and were unpleasantly surprised that it is unreliable.=
 In
>>>>> our case =E2=80=9Cnodetool repair=E2=80=9D didn=E2=80=99t fix the iss=
ue.
>>>>>
>>>>> So I echo Frens questions.
>>>>>
>>>>> Thanks,
>>>>> Mikhail
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan <mail@frensjan.nl>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Is it to be expected that select count(*) from ... and select
>>>>>> distinct partition-key-columns from ... to yield inconsistent result=
s
>>>>>> between executions even though the table at hand isn't written to?
>>>>>>
>>>>>> I have a table in a keyspace with replication_factor =3D 1 which is
>>>>>> something like:
>>>>>>
>>>>>>  CREATE TABLE tbl (
>>>>>>     id frozen<id_type>,
>>>>>>     bucket bigint,
>>>>>>     offset int,
>>>>>>     value double,
>>>>>>     PRIMARY KEY ((id, bucket), offset)
>>>>>> )
>>>>>>
>>>>>> The frozen udt is:
>>>>>>
>>>>>>  CREATE TYPE id_type (
>>>>>>     tags map<text, text>
>>>>>> );
>>>>>>
>>>>>> When I do select count(*) from tbl several times the actual count
>>>>>> varies with 5 to 10%. Also when performing select distinct id, bucke=
t from
>>>>>> tbl the results aren't consistent over several query executions. The=
 table
>>>>>> is not being written to at the time I performed the queries.
>>>>>>
>>>>>> Is this to be expected? Or is this a bug? Is there a alternative
>>>>>> method / workaround?
>>>>>>
>>>>>> I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with
>>>>>> Oracle Java 1.8.0_31.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Frens Jan
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

--089e016338064585ad0510f8c1e7
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Thanks for the suggestion DuyHai. I assume you mean CL=3DQ=
UORUM (as in consistency level, not replication factor). As expected, setti=
ng the consistency level to quorum or all=C2=A0yields equally inconsistent =
results=C2=A0for the select count and select distinct queries.<div><br><div=
>Which is good in a way, because if RF=3D1 and CL=3DONE I would expect an e=
rror if one of the nodes wouldn&#39;t be able to answer a query.=C2=A0</div=
><div><br></div><div>Note that there conceptually is no such thing as a quo=
rum or majority when RF=3D1. As quorum in C* is defined as floor( (RF / 2) =
+ 1 ), in case of RF=3D1 this in practice is the same as CL=3DONE.</div></d=
iv></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On 10 Ma=
rch 2015 at 18:10, DuyHai Doan <span dir=3D"ltr">&lt;<a href=3D"mailto:doan=
duyhai@gmail.com" target=3D"_blank">doanduyhai@gmail.com</a>&gt;</span> wro=
te:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">First idea to elimin=
ate any issue with regards to staled data: issue the same count query with =
RF=3DQUORUM and check whether there are still inconsistencies</div><div cla=
ss=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">On Tue, Mar 10, 2015 at 9:13 AM, Rumph, Frens Jan <span di=
r=3D"ltr">&lt;<a href=3D"mailto:mail@frensjan.nl" target=3D"_blank">mail@fr=
ensjan.nl</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=
=3D"ltr"><span style=3D"font-size:12.8000001907349px">Hi Jens, Mikhail, Dae=
meon,</span><br style=3D"font-size:12.8000001907349px"><br style=3D"font-si=
ze:12.8000001907349px"><span style=3D"font-size:12.8000001907349px">Thanks =
for your replies. Sorry for my reply being late ... mails from the user-lis=
t were moved to the wrong inbox on my side.</span><br style=3D"font-size:12=
.8000001907349px"><br style=3D"font-size:12.8000001907349px"><span style=3D=
"font-size:12.8000001907349px">I&#39;m in a development environment and thu=
s using replication factor =3D 1 and consistency =3D ONE with three nodes. =
So the &#39;results from different nodes between queries&#39; hypothesis se=
ems unlikely to me. I would expect a timeout if some node wouldn&#39;t be a=
ble to answer.</span><br style=3D"font-size:12.8000001907349px"><br style=
=3D"font-size:12.8000001907349px"><span style=3D"font-size:12.8000001907349=
px">I tried tracing, but I couldn&#39;t really make any of it.</span><br st=
yle=3D"font-size:12.8000001907349px"><br style=3D"font-size:12.800000190734=
9px"><span style=3D"font-size:12.8000001907349px">For example I performed t=
wo select distinct ... from ... queries: Traces for both of them contained =
more than one line like &#39;Submitting range requests on ... ranges ...=
9; and &#39;Submitted ... concurrent range requests covering ... ranges&#39=
;. These lines occur with varying numbers, e.g. :</span><br style=3D"font-s=
ize:12.8000001907349px"><br style=3D"font-size:12.8000001907349px"><blockqu=
ote type=3D"cite" style=3D"font-size:12.8000001907349px">Submitting range r=
equests on 593 ranges with a concurrency of 75 (1.35 rows per range expecte=
d)=C2=A0<br>Submitting range requests on 769 ranges with a concurrency of 7=
5 (1.35 rows per range expected)</blockquote><br style=3D"font-size:12.8000=
001907349px"><span style=3D"font-size:12.8000001907349px">Also when looking=
 at the lines like &#39;Executing seq scan across ... sstables for ...&#39;=
 I saw that in one case which yielded way less partition keys that only the=
 tokens from -9223372036854770000=C2=A0 to -594461978511041000 were include=
d. In a case which yielded much more partition keys, the entire token range=
 did seem to be queried.</span><br style=3D"font-size:12.8000001907349px"><=
br style=3D"font-size:12.8000001907349px"><span style=3D"font-size:12.80000=
01907349px">To reiterate my initial questions: is this behavior to be expec=
ted? Am I doing something wrong? Is there a workaround?</span><br style=3D"=
font-size:12.8000001907349px"><br style=3D"font-size:12.8000001907349px"><s=
pan style=3D"font-size:12.8000001907349px">Best regards,</span><br style=3D=
"font-size:12.8000001907349px"><span style=3D"font-size:12.8000001907349px"=
>Frens Jan</span><br></div><div><div><div class=3D"gmail_extra"><br><div cl=
ass=3D"gmail_quote">On 4 March 2015 at 22:59, daemeon reiydelle <span dir=
=3D"ltr">&lt;<a href=3D"mailto:daemeonr@gmail.com" target=3D"_blank">daemeo=
nr@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div d=
ir=3D"ltr"><div class=3D"gmail_default" style=3D"font-family:comic sans ms,=
sans-serif;color:rgb(7,55,99)">What is the replication? Could you be servin=
g stale data from a node that was not properly replicated (hints timeout ex=
ceeded by a node being down?)<br></div><div><div><div class=3D"gmail_extra"=
><br clear=3D"all"><div><div><div dir=3D"ltr"><div><div dir=3D"ltr"><span s=
tyle=3D"color:rgb(56,118,29)"><span style=3D"background-color:rgb(255,255,2=
55)"><b><span style=3D"font-family:comic sans ms,sans-serif"></span></b></s=
pan></span><span style=3D"color:rgb(56,118,29)"><span style=3D"background-c=
olor:rgb(255,255,255)"><b><span style=3D"font-family:comic sans ms,sans-ser=
if"></span></b></span></span><font size=3D"1"><i><br></i></font></div></div=
></div></div></div>
<br><div class=3D"gmail_quote">On Wed, Mar 4, 2015 at 11:03 AM, Jens Rantil=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:jens.rantil@tink.se" target=3D"_bl=
ank">jens.rantil@tink.se</a>&gt;</span> wrote:<br><blockquote class=3D"gmai=
l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left=
:1ex">
<div>Frens,<br><br>What consistency are you querying with? Could be you are=
 simply receiving result from different nodes each time.<br><br>Jens</div>
<div>
<br>=E2=80=93<br>Skickat fr=C3=A5n <a href=3D"https://www.dropbox.com/mailb=
ox" target=3D"_blank">Mailbox</a>
</div><div><div>
<br><br><div class=3D"gmail_quote"><p>On Wed, Mar 4, 2015 at 7:08 PM, Mikha=
il Strebkov <span dir=3D"ltr">&lt;<a href=3D"mailto:strebkov@gmail.com" tar=
get=3D"_blank">strebkov@gmail.com</a>&gt;</span> wrote:<br></p><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex"><div>
<span><div>We have observed the same issue in our production Cassandra clus=
ter (5 nodes in one DC). We use Cassandra 2.1.3 (I joined the list too late=
 to realize we shouldn=E2=80=99t user 2.1.x yet) on Amazon machines (create=
d from community AMI).</div>
<div><br></div>
<div>In addition to count variations with 5 to 10% we observe variations fo=
r the query =E2=80=9Cselect * from table1 where time &gt; &#39;$fromDate=
9; and time &lt; &#39;$toDate&#39; allow filtering=E2=80=9D results. We ite=
rated through the results multiple times using official Java driver. We use=
d that query for a huge data migration and were unpleasantly surprised that=
 it is unreliable. In our case =E2=80=9Cnodetool repair=E2=80=9D didn=E2=80=
=99t fix the issue.</div>
<div><br></div>
<div>So I echo Frens questions.</div>
<div><br></div>
<div>Thanks,</div>
<div>Mikhail</div></span><div>
<br><br></div>
<br><br><div class=3D"gmail_quote">
<p>On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan <span dir=3D"ltr">&lt;<=
a href=3D"mailto:mail@frensjan.nl" target=3D"_blank">mail@frensjan.nl</a>&g=
t;</span> wrote:<br></p>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div><div dir=3D"ltr">Hi,<div><br></div>
<div>Is it to be expected that select count(*) from ... and select distinct=
 partition-key-columns from ... to yield inconsistent results between execu=
tions even though the table at hand isn&#39;t written to?</div>
<div><br></div>
<div>I have a table in a keyspace with replication_factor =3D 1 which is so=
mething like:</div>
<div><br></div>
<div>
<div>CREATE TABLE tbl (</div>
<div>=C2=A0 =C2=A0 id frozen&lt;id_type&gt;,</div>
<div>=C2=A0 =C2=A0 bucket bigint,</div>
<div>=C2=A0 =C2=A0 offset int,</div>
<div>=C2=A0 =C2=A0 value double,</div>
<div>=C2=A0 =C2=A0 PRIMARY KEY ((id, bucket), offset)</div>
<div>)</div>
</div>
<div><br></div>
<div>The frozen udt is:</div>
<div><br></div>
<div>
<div>CREATE TYPE id_type (</div>
<div>=C2=A0 =C2=A0 tags map&lt;text, text&gt;</div>
<div>);</div>
</div>
<div><br></div>
<div>When I do select count(*) from tbl several times the actual count vari=
es with 5 to 10%. Also when performing select distinct id, bucket from tbl =
the results aren&#39;t consistent over several query executions. The table =
is not being written to at the time I performed the queries.</div>
<div><br></div>
<div>Is this to be expected? Or is this a bug? Is there a alternative metho=
d / workaround?</div>
<div><br></div>
<div>I&#39;m using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with=
 Oracle Java 1.8.0_31.</div>
<div><br></div>
<div>Thanks in advance,</div>
<div>Frens Jan</div>
</div></div></blockquote>
</div>
<br></div></blockquote></div><br></div></div></blockquote></div><br></div><=
/div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--089e016338064585ad0510f8c1e7--