cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mikhail Strebkov" <>
Subject Re: Inconsistent count(*) and distinct results from Cassandra
Date Wed, 04 Mar 2015 18:06:57 GMT
We have observed the same issue in our production Cassandra cluster (5 nodes in one DC). We
use Cassandra 2.1.3 (I joined the list too late to realize we shouldn’t user 2.1.x yet)
on Amazon machines (created from community AMI).

In addition to count variations with 5 to 10% we observe variations for the query “select
* from table1 where time > '$fromDate' and time < '$toDate' allow filtering” results.
We iterated through the results multiple times using official Java driver. We used that query
for a huge data migration and were unpleasantly surprised that it is unreliable. In our case
“nodetool repair” didn’t fix the issue.

So I echo Frens questions.



On Wed, Mar 4, 2015 at 3:55 AM, Rumph, Frens Jan <> wrote:

> Hi,
> Is it to be expected that select count(*) from ... and select distinct
> partition-key-columns from ... to yield inconsistent results between
> executions even though the table at hand isn't written to?
> I have a table in a keyspace with replication_factor = 1 which is something
> like:
>     id frozen<id_type>,
>     bucket bigint,
>     offset int,
>     value double,
>     PRIMARY KEY ((id, bucket), offset)
> )
> The frozen udt is:
> CREATE TYPE id_type (
>     tags map<text, text>
> );
> When I do select count(*) from tbl several times the actual count varies
> with 5 to 10%. Also when performing select distinct id, bucket from tbl the
> results aren't consistent over several query executions. The table is not
> being written to at the time I performed the queries.
> Is this to be expected? Or is this a bug? Is there a alternative method /
> workaround?
> I'm using cqlsh 5.0.1 with Cassandra 2.1.2 on 64bit fedora 21 with Oracle
> Java 1.8.0_31.
> Thanks in advance,
> Frens Jan
View raw message