cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Mitchell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
Date Fri, 21 Mar 2014 21:31:50 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943586#comment-13943586
] 

Bill Mitchell commented on CASSANDRA-6825:
------------------------------------------

As it happens, I have that info handy as my JUnit testcase includes it in the log4j output:


CREATE TABLE testdb_1395374703023.sr (
    siteid text,
    listid bigint,
    partition int,
    createdate timestamp,
    emailcrypt text,
    emailaddr text,
    properties text,
    removedate timestamp,
    PRIMARY KEY ((siteid, listid, partition), createdate, emailcrypt)
) WITH CLUSTERING ORDER BY (createdate DESC, emailcrypt ASC)
   AND read_repair_chance = 0.1
   AND dclocal_read_repair_chance = 0.0
   AND replicate_on_write = true
   AND gc_grace_seconds = 864000
   AND bloom_filter_fp_chance = 0.01
   AND caching = 'KEYS_ONLY'
   AND comment = ''
   AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
}
   AND compression = { 'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor'
};

(siteID was a BIGINT until recently when the schema was changed to TEXT to match the use of
siteID elsewhere in the product.  I had not thought to represent our Java String as a Cassandra
UUID.)

> COUNT(*) with WHERE not finding all the matching rows
> -----------------------------------------------------
>
>                 Key: CASSANDRA-6825
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: quad core Windows7 x64, single node cluster
> Cassandra 2.0.5
>            Reporter: Bill Mitchell
>            Assignee: Tyler Hobbs
>         Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt, testdb_1395372407904.zip,
testdb_1395372407904.zip
>
>
> Investigating another problem, I needed to do COUNT(*) on the several partitions of a
table immediately after a test case ran, and I discovered that count(*) on the full table
and on each of the partitions returned different counts.  
> In particular case, SELECT COUNT(*) FROM sr LIMIT 1000000; returned the expected count
from the test 99999 rows.  The composite primary key splits the logical row into six distinct
partitions, and when I issue a query asking for the total across all six partitions, the returned
result is only 83999.  Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11
AND partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical WHERE predicate
reports only 14,000. 
> This is failing immediately after running a single small test, such that there are only
two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to run.  
> In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message