cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6825) COUNT(*) with WHERE not finding all the matching rows
Date Tue, 01 Apr 2014 16:29:21 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956721#comment-13956721
] 

Tyler Hobbs commented on CASSANDRA-6825:
----------------------------------------

[~slebresne] the logic is primarily broken because it continues checking latter components
after it knows that the first component intersects.  For example, suppose you have a slice
of {{((1, 1), "")}}, min column names of {{(0, 2)}}, and max column names of {{(2, 3)}}. 
The first component of the slice start falls within the min/max range; the second component
does not.  Although the slice is _starting_ outside of the min/max range for the second component,
it should be considered intersecting because we'll accept other values for the second component
(for higher values of the first component).  The current logic sees that the second component
doesn't fall within min/max and considers it non-intersecting.

> COUNT(*) with WHERE not finding all the matching rows
> -----------------------------------------------------
>
>                 Key: CASSANDRA-6825
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6825
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: quad core Windows7 x64, single node cluster
> Cassandra 2.0.5
>            Reporter: Bill Mitchell
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.7, 2.1 beta2
>
>         Attachments: cassandra.log, selectpartitions.zip, selectrowcounts.txt, testdb_1395372407904.zip,
testdb_1395372407904.zip
>
>
> Investigating another problem, I needed to do COUNT(*) on the several partitions of a
table immediately after a test case ran, and I discovered that count(*) on the full table
and on each of the partitions returned different counts.  
> In particular case, SELECT COUNT(*) FROM sr LIMIT 1000000; returned the expected count
from the test 99999 rows.  The composite primary key splits the logical row into six distinct
partitions, and when I issue a query asking for the total across all six partitions, the returned
result is only 83999.  Drilling down, I find that SELECT * from sr WHERE s = 5 AND l = 11
AND partition = 0; returns 30,000 rows, but a SELECT COUNT(*) with the identical WHERE predicate
reports only 14,000. 
> This is failing immediately after running a single small test, such that there are only
two SSTables, sr-jb-1 and sr-jb-2.  Compaction never needed to run.  
> In selectrowcounts.txt is a copy of the cqlsh output showing the incorrect count(*) results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message