cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-3982) Explore not returning range ghosts
Date Thu, 10 May 2012 14:31:51 GMT


Sylvain Lebresne updated CASSANDRA-3982:

    Attachment: 3982.txt

I think that what is currently returned by CQL3 is not consistent with respect to handling
"null" values. Let me illustrate 2 problems with examples. Consider the following table:
    k int PRIMARY KEY,
    c1 int,
    c2 int,
First, consider this holds:
 k | c1 | c2
 0 | 0  | 0
 1 | 1  | 1
 2 | 2  | 2
then {{SELECT k FROM test}} will return:
But if we do
then this will still return the same result (i.e. 2 will show up). Of course this is just
the good ol' range ghost problem, but I want to illustrate that while this was "merely" unintuitive
in thrift, this is imho just wrong in CQL. I think that we should define a (CQL) row as existing
only if it contains one non-primary-key column with a value. And of course, we shouldn't return
value that doesn't exist.

The second problem we have is not due to range ghosts. Consider the same table and say that
it now contains:
 k | c1 | c2
 0 | 0  | 0
 1 | 1  |
 2 | 2  | 2
i.e. the second row has no value for c2. If we do
then currently this returns
The null returned here is because RangeSlice returns an empty ColumnFamily when the filter
match nothing. However,
SELECT v2 FROM test WHERE k = 1
doesn't return anything because the filter selects only v2 and getSlice returns a null ColumnFamily
in that case. It *does not* return a single "null" result in particular, which is incoherent
with the result to the previous query.

Anyway, I think there is two possible approaches to unify this:
# take the "SQL" approach and say that a select returns every row that matches the WHERE clause,
independently of whether the selected columns exists or not. In that approach, the last request
above should include a null.
# define that a row is included in the result set only if it has at least one non-null value
in the *selected* columns. I.e. none of the two requests above should include a null.

I actually think that we should pick the "SQL" approach because 1) doing otherwise would be
much too unintuive to people coming from SQL and 2) because it's better defined when you select
only primary key columns.

I note however that there is a downside to that solution: it means that when selecting one
column, internally we should query all the columns for the (CQL) row, not just the selected
ones, to know if the row exists. However, a solution to that will be to introduce 'IS NOT
NULL' (#3783).

Anway, attaching a patch that 1) consider rows to exists only if they have at least one column
matching the where clause and 2) use the "SQL" approach above. Tests have been pushed in the

> Explore not returning range ghosts
> ----------------------------------
>                 Key: CASSANDRA-3982
>                 URL:
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API
>    Affects Versions: 1.1.0
>            Reporter: Sylvain Lebresne
>              Labels: cql3
>             Fix For: 1.1.1
>         Attachments: 3982.txt
> This ticket proposes to remove range ghosts in CQL3.
> The basic argument is that range ghosts confuses users a lot and don't add any value
since range ghost don't allow to distinguish between the two following case:
> * the row is deleted
> * the row is not deleted but don't have data for the provided filter

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message