cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8490) DISTINCT queries with LIMITs or paging are incorrect when partitions are deleted
Date Wed, 17 Dec 2014 23:42:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250824#comment-14250824
] 

Tyler Hobbs commented on CASSANDRA-8490:
----------------------------------------

This is a tricky one.  It affects both DISTINCT queries with paging and DISTINCT queries with
limits.

The root of the problem is that ColumnFamilyStore.filter() counts tombstoned partitions towards
the row limit.  This is the correct behavior for Thrift compatibility, but is not the correct
behavior for CQL3 queries.  Normal CQL3 queries don't have this problem, but DISTINCT queries
are treated differently: countCQL3Rows is false, and we use the row (partition) limit instead
of a cql3 row limit (the "column" limit).

Due to the way that range commands are serialized (and versioned), we can't add a field or
flag within a minor Cassandra version, so there's not a great way to indicate to replicas
that tombstoned partitions should not count towards the limit (i.e. we want cql3 behavior).
That leaves us with a few choices:
* Never count tombstoned partitions towards the limit, and trim excess partitions on the coordinator.
 This would be fine for CQL3, but for Thrift, replicas could end up sending the coordinator
a lot of extra partitions if there are a lot of partition tombstones.
* Add a new, optional flag to the range command serialization format (keeping in mind the
new {{countCQL3Rows}} flag in 2.1) or do something like use a special {{compositesToGroup()}}
value of -2  to signal that tombstoned partitions should not count towards the limit. The
main problem with this is upgrades to 2.1.  There would be a window of 2.1 versions where
upgrading from 2.0.12 would break these queries.  This is a pretty hacky option.
* Leave {{DISTINCT ... LIMIT}} queries broken, but partially fix the paging situation by only
considering the query exhausted when the count of all rows in the fetch page (not just live
ones) is less than the page size.

What do you think, [~slebresne]?  (P.S. I can't wait for your refactor :) ).

> DISTINCT queries with LIMITs or paging are incorrect when partitions are deleted
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8490
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8490
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Driver version: 2.1.3.
> Cassandra version: 2.0.11/2.1.2.
>            Reporter: Frank Limstrand
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.12, 2.1.3
>
>
> Using paging demo code from https://github.com/PatrickCallaghan/datastax-paging-demo
> The code creates and populates a table with 1000 entries and pages through them with
setFetchSize set to 100. If we then delete one entry with 'cqlsh':
> {noformat}
> cqlsh:datastax_paging_demo> delete from datastax_paging_demo.products  where productId
= 'P142'; (The specified productid is number 6 in the resultset.)
> {noformat}
> and run the same query ("Select * from") again we get:
> {noformat}
> [com.datastax.paging.Main.main()] INFO  com.datastax.paging.Main - Paging demo took 0
secs. Total Products : 999
> {noformat}
> which is what we would expect.
> If we then change the "select" statement in dao/ProductDao.java (line 70) from "Select
* from " to "Select DISTINCT productid from " we get this result:
> {noformat}
> [com.datastax.paging.Main.main()] INFO  com.datastax.paging.Main - Paging demo took 0
secs. Total Products : 99
> {noformat}
> So it looks like the tombstone stops the paging behaviour. Is this a bug?
> {noformat}
> DEBUG [Native-Transport-Requests:788] 2014-12-16 10:09:13,431 Message.java (line 319)
Received: QUERY Select DISTINCT productid from datastax_paging_demo.products, v=2
> DEBUG [Native-Transport-Requests:788] 2014-12-16 10:09:13,434 AbstractQueryPager.java
(line 98) Fetched 99 live rows
> DEBUG [Native-Transport-Requests:788] 2014-12-16 10:09:13,434 AbstractQueryPager.java
(line 115) Got result (99) smaller than page size (100), considering pager exhausted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message