cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Lerer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8940) Inconsistent select count and select distinct
Date Tue, 28 Apr 2015 10:49:06 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516797#comment-14516797
] 

Benjamin Lerer commented on CASSANDRA-8940:
-------------------------------------------

When performing a count operation, Cassandra request all the needed data from all the replicas
and count them on the coordinator. As the count is a straight forward operation, my guess
was that the problem was coming from the returned data. 
To verify that idea, I build a program that does a select all query and analyse the results
as well as some extra informations.

I did multiple runs to try to have proper metrics.

Here is the output of my latest run with 5 ids, 100 buckets and 1000 offsets.

{code}
Connected to cluster: Test Cluster
Datatacenter: datacenter1; Host: /192.168.33.12; Rack: rack1
Datatacenter: datacenter1; Host: /192.168.33.11; Rack: rack1
----------------------------------------
Missing range: [479-999] for key: id:d, bucket:23 with count: 442479 (coordinator: /192.168.33.11:9042
replica: /192.168.33.11:9042)
Next row (coordinator: /192.168.33.11:9042 replica: /192.168.33.11:9042)
total count: 499479
----------------------------------------
Missing range: [340-999] for key: id:d, bucket:23 with count: 442340 (coordinator: /192.168.33.11:9042
replica: /192.168.33.11:9042)
Next row (coordinator: /192.168.33.11:9042 replica: /192.168.33.11:9042)
total count: 499340
----------------------------------------
total count: 500000
----------------------------------------
Missing range: [858-999] for key: id:a, bucket:44 with count: 254858 (coordinator: /192.168.33.12:9042
replica: /192.168.33.12:9042)
Next row (coordinator: /192.168.33.12:9042 replica: /192.168.33.12:9042)
Missing range: [885-999] for key: id:c, bucket:38 with count: 404743 (coordinator: /192.168.33.12:9042
replica: /192.168.33.12:9042)
Next row (coordinator: /192.168.33.12:9042 replica: /192.168.33.11:9042)
total count: 498743
----------------------------------------
Missing range: [408-999] for key: id:a, bucket:3 with count: 69408 (coordinator: /192.168.33.11:9042
replica: /192.168.33.11:9042)
Next row (coordinator: /192.168.33.11:9042 replica: /192.168.33.11:9042)
total count: 491408
----------------------------------------
total count: 500000
----------------------------------------
total count: 500000
----------------------------------------
total count: 500000
----------------------------------------
total count: 500000
----------------------------------------
total count: 500000
----------------------------------------
Missing range: [152-999] for key: id:d, bucket:68 with count: 154152 (coordinator: /192.168.33.12:9042
replica: /192.168.33.12:9042)
Next row (coordinator: /192.168.33.12:9042 replica: /192.168.33.11:9042)
Missing range: [1-999] for key: id:e, bucket:23 with count: 390153 (coordinator: /192.168.33.12:9042
replica: /192.168.33.12:9042)
Next row (coordinator: /192.168.33.12:9042 replica: /192.168.33.12:9042)
total count: 497153
----------------------------------------
total count: 500000
----------------------------------------
total count: 500000
----------------------------------------
Missing range: [905-999] for key: id:b, bucket:93 with count: 253905 (coordinator: /192.168.33.12:9042
replica: /192.168.33.12:9042)
Next row (coordinator: /192.168.33.12:9042 replica: /192.168.33.12:9042)
Missing range: [680-999] for key: id:a, bucket:44 with count: 254585 (coordinator: /192.168.33.12:9042
replica: /192.168.33.12:9042)
Next row (coordinator: /192.168.33.12:9042 replica: /192.168.33.12:9042)
total count: 498585
----------------------------------------
Missing range: [968-999] for key: id:a, bucket:42 with count: 268968 (coordinator: /192.168.33.11:9042
replica: /192.168.33.11:9042)
Next row (coordinator: /192.168.33.11:9042 replica: /192.168.33.11:9042)
total count: 499968
----------------------------------------
Missing range: [635-999] for key: id:c, bucket:75 with count: 59635 (coordinator: /192.168.33.11:9042
replica: /192.168.33.11:9042)
Next row (coordinator: /192.168.33.11:9042 replica: /192.168.33.12:9042)
total count: 496635
----------------------------------------
Missing range: [805-999] for key: id:a, bucket:44 with count: 254805 (coordinator: /192.168.33.12:9042
replica: /192.168.33.12:9042)
Next row (coordinator: /192.168.33.12:9042 replica: /192.168.33.12:9042)
total count: 498805
----------------------------------------
Missing range: [373-999] for key: id:d, bucket:68 with count: 154373 (coordinator: /192.168.33.12:9042
replica: /192.168.33.12:9042)
Next row (coordinator: /192.168.33.12:9042 replica: /192.168.33.11:9042)
total count: 499373
----------------------------------------
total count: 500000
----------------------------------------
total count: 500000
{code}

The result show that the missing data are always at the end of a partition (e.g. the partition
is being truncated at a random position). In all the experiences that I have run, Cassandra
has never lost data in the middle of a partition.

Everytime that some data were lost, the data was located on the coordinator node. The data
of the next partition were coming either form the same node or from another one. 

There seems to be no link between the failure and the automatic paging as the problem was
not specially occuring at the page boundaries. 



> Inconsistent select count and select distinct
> ---------------------------------------------
>
>                 Key: CASSANDRA-8940
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8940
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: 2.1.2
>            Reporter: Frens Jan Rumph
>            Assignee: Benjamin Lerer
>         Attachments: 7b74fb00-e935-11e4-b10c-317579db7eb4.csv, 8d5899d0-e935-11e4-847b-2d06da75a6cd.csv,
Vagrantfile, install_cassandra.sh, setup_hosts.sh
>
>
> When performing {{select count( * ) from ...}} I expect the results to be consistent
over multiple query executions if the table at hand is not written to / deleted from in the
mean time. However, in my set-up it is not. The counts returned vary considerable (several
percent). The same holds for {{select distinct partition-key-columns from ...}}.
> I have a table in a keyspace with replication_factor = 1 which is something like:
> {code}
> CREATE TABLE tbl (
>     id frozen<id_type>,
>     bucket bigint,
>     offset int,
>     value double,
>     PRIMARY KEY ((id, bucket), offset)
> )
> {code}
> The frozen udt is:
> {code}
> CREATE TYPE id_type (
>     tags map<text, text>
> );
> {code}
> The table contains around 35k rows (I'm not trying to be funny here ...). The consistency
level for the queries was ONE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message