cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From daemeon reiydelle <daeme...@gmail.com>
Subject Re: Inconsistent count(*) and distinct results from Cassandra
Date Wed, 04 Mar 2015 21:59:28 GMT
What is the replication? Could you be serving stale data from a node that
was not properly replicated (hints timeout exceeded by a node being down?)



On Wed, Mar 4, 2015 at 11:03 AM, Jens Rantil <jens.rantil@tink.se> wrote:

> Frens,
>
> What consistency are you querying with? Could be you are simply receiving
> result from different nodes each time.
>
> Jens
>
> –
> Skickat från Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Wed, Mar 4, 2015 at 7:08 PM, Mikhail Strebkov <strebkov@gmail.com>
> wrote:
>
>> We have observed the same issue in our production Cassandra cluster (5
>> nodes in one DC). We use Cassandra 2.1.3 (I joined the list too late to
>> realize we shouldn’t user 2.1.x yet) on Amazon machines (created from
>> community AMI).
>>
>> In addition to count variations with 5 to 10% we observe variations for
>> the query “select * from table1 where time > '$fromDate' and time <
>> '$toDate' allow filtering” results. We iterated through the results
>> multiple times using official Java driver. We used that query for a huge
>> data migration and were unpleasantly surprised that it is unreliable. In
>> our case “nodetool repair” didn’t fix the issue.
>>
>> So I echo Frens questions.
>>
>> Thanks,
>> Mikhail
>>
>>
>>
>>
>> On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan <mail@frensjan.nl>
>> wrote:
>>
>>> Hi,
>>>
>>> Is it to be expected that select count(*) from ... and select distinct
>>> partition-key-columns from ... to yield inconsistent results between
>>> executions even though the table at hand isn't written to?
>>>
>>> I have a table in a keyspace with replication_factor = 1 which is
>>> something like:
>>>
>>>  CREATE TABLE tbl (
>>>     id frozen<id_type>,
>>>     bucket bigint,
>>>     offset int,
>>>     value double,
>>>     PRIMARY KEY ((id, bucket), offset)
>>> )
>>>
>>> The frozen udt is:
>>>
>>>  CREATE TYPE id_type (
>>>     tags map<text, text>
>>> );
>>>
>>> When I do select count(*) from tbl several times the actual count varies
>>> with 5 to 10%. Also when performing select distinct id, bucket from tbl the
>>> results aren't consistent over several query executions. The table is not
>>> being written to at the time I performed the queries.
>>>
>>> Is this to be expected? Or is this a bug? Is there a alternative method
>>> / workaround?
>>>
>>> I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with
>>> Oracle Java 1.8.0_31.
>>>
>>> Thanks in advance,
>>> Frens Jan
>>>
>>
>>
>

Mime
View raw message