cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mircea Lemnaru (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CASSANDRA-11314) Inconsistent select count(*)
Date Tue, 08 Mar 2016 10:17:40 GMT
Mircea Lemnaru created CASSANDRA-11314:
------------------------------------------

             Summary: Inconsistent select count(*)
                 Key: CASSANDRA-11314
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11314
             Project: Cassandra
          Issue Type: Bug
          Components: Local Write-Read Paths
         Environment: Ununtu 14.04 LTS
            Reporter: Mircea Lemnaru


Hello,

I currently have this setup: 

Cassandra 3.3 (Community edition downloaded from Datastax) installed on 3 nodes and I have
created this table:

CREATE TABLE billing.collected_data_day (
    collection_day int,
    timestamp timestamp,
    record_id uuid,
    dimensions map<text, text>,
    entity_id text,
    measurements map<text, text>,
    source_id text,
    PRIMARY KEY (collection_day, timestamp, record_id)
) WITH CLUSTERING ORDER BY (timestamp ASC, record_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

This table as you notice is partitioned by collection_day. This is because at the end of the
day we need to have fast access to all the data generated in a day. collection day will be
the x day from 1970

In this table we have inserted roughly 12milion rows for testing purposes and we did a simple
count. As you can see the results vary ... 

cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;

 count
-------
 55341

(1 rows)
cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;

 count
-------
 55372

(1 rows)
cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;

 count
-------
 55300

(1 rows)
cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;

 count
-------
 55300

(1 rows)
cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;

 count
-------
 55300

(1 rows)
cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;

 count
-------
 55303

(1 rows)
cqlsh:billing> select count(*) from collected_data_day where collection_day=16462;

 count
-------
 55374

(1 rows)

I am running the query from the seed node of the cassandra cluster. As you can see most of
the results are varying and I don't know the reason for this. We are not writing anything
into the cluster at this time , we are only querying the cluster and only using this CQLSH.

This is very similar to CASSANDRA-8940 but that is targeted for 2.1x

Could it be that we are having the same issue in 3.3 ? 

Please let me know what extra info I can provide



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message