cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-7059) Range query with strict bound on clustering column can return less results than required for compact tables
Date Sat, 10 May 2014 22:11:08 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-7059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-7059:
----------------------------------------

    Attachment: 7059.txt

As explained above, fixing this properly is actually pretty hard. Also, the cases affected
feels narrow enough that we might want to think twice before adding too much complexity to
solve this. But, without excluding any better solution later, I think that the first thing
we should do is start refusing queries that we know might return broken results: correction
should come first and we known we're not correct. So attaching a relatively trivial patch
that does just that. If we agree on that first step I'll make sure to put the proper warnings
in the NEWS file too.

> Range query with strict bound on clustering column can return less results than required
for compact tables
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7059
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7059
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>             Fix For: 2.0.9
>
>         Attachments: 7059.txt
>
>
> What's wrong:
> {noformat}
> CREATE TABLE test (
>     k int,
>     v int,
>     PRIMARY KEY (k, v)
> ) WITH COMPACT STORAGE;
> INSERT INTO test(k, v) VALUES (0, 0);
> INSERT INTO test(k, v) VALUES (0, 1);
> INSERT INTO test(k, v) VALUES (1, 0);
> INSERT INTO test(k, v) VALUES (1, 1);
> INSERT INTO test(k, v) VALUES (2, 0);
> INSERT INTO test(k, v) VALUES (2, 1);
> SELECT * FROM test WHERE v > 0 LIMIT 3 ALLOW FILTERING;
>  k | v
> ---+---
>  1 | 1
>  0 | 1
> {noformat}
> That last query should return 3 results.
> The problem lies into how we deal with 'strict greater than' ({{>}}) for "wide" compact
storage table. Namely, for those tables, we internally only support inclusive bounds (for
CQL3 tables this is not a problem as we deal with this using the 'end-of-component' of the
CompositeType encoding). So we "compensate" by asking one more result than asked by the user,
and we trim afterwards if that was unnecessary. This works fine for per-partition queries,
but don't for "range" queries since we potentially would have to ask for {{X}} more results
where {{X}} is the number of partition fetched, but we don't know {{X}} beforehand.
> I'll note that:
> * this has always be there
> * this only (potentially) affect compact tables
> * this only affect range queries that have a strict bound on the clustering column (this
means only {{ALLOW FILTERING}}) queries in particular.
> * this only matters if a {{LIMIT}} is set on the query.
> As for fixes, it's not entirely trivial. The "right" fix would probably be to start supporting
non-inclusive bound internally, but that's far from a small fix and is "at best" a 2.1 fix
(since we'll have to make a messaging protocol change to ship some additional info for SliceQueryFilter).
Also, this might be a lot of work for something that only affect some {{ALLOW FILTERING}}
queries on compact tables.
> Another (somewhat simpler) solution might be to detect when we have this kind of queries
and use a pager with no limit. We would then query a first page using the user limit (plus
some smudge factor to avoid being inefficient too often) and would continue paging unless
either we've exhausted all results or we can prove that post-processing we do have enough
results to satisfy the user limit.  This does mean in some case we might do 2 or more internal
queries, but in practice we can probably make that case very rare, and since the query is
an {{ALLOW FILTERING}} one, the user is somewhat warned that the query may not be terribly
efficient.
> Lastly, we could always start by disallowing the kind of query that is potentially problematic
(until we have a proper fix), knowing that users can work around that by either using non-strict
bounds or removing the {{LIMIT}}, whichever makes the most sense in their case. In 1.2 in
particular, we don't have the query pagers, so the previous solution I describe would be a
bit of a mess to implement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message