cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6588) Add a 'NO EMPTY RESULTS' filter to SELECT
Date Thu, 16 Jan 2014 09:39:21 GMT


Sylvain Lebresne commented on CASSANDRA-6588:

In terms of results returned, it's not. In terms of "how hard is it to implement in the code
today", it's a world of difference however (as in, I'be put 2.1 as target for this because
it's an improvement not a bug fix, but the complexity of code needed to add a simple filter
is really simple and pushing it in 2.0 if we wanted wouldn't be crazy; while supporting the
OR version likely means a 3.0 target at best). Also, supporting OR and 'not null' conditions
for this feels a bit overkill. As in, I'm not sure it's such a good idea to support both if
we're only gonna support them in this very specific use case. Plus, if you select 8 or 9 columns,
the conditions with OR becomes a bit of a pain to write when all you want to do is allow the
engine to optimize the query to not query the full CQL row.

Note that, if we have good reason later to add support for 'OR' and 'not null' (i.e. we support
it in a much more general way), then a 'NO EMPTY' filter could then just extend to the OR
query, at which point the 'NO EMPTY' will just be syntactic sugar for convenience, but I don't
think there will any harm in that.

> Add a 'NO EMPTY RESULTS' filter to SELECT
> -----------------------------------------
>                 Key: CASSANDRA-6588
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 2.1
> It is the semantic of CQL that a (CQL) row exists as long as it has one non-null column
(including the PK columns, which, given that no PK columns can be null, means that it's enough
to have the PK set for a row to exist). This does means that the result to
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v1 int, v2 int);
> INSERT INTO test(k, v1) VALUES (0, 4);
> SELECT v2 FROM test;
> {noformat}
> must be (and is)
> {noformat}
>  v2
> ------
>  null
> {noformat}
> That fact does mean however that when we only select a few columns of a row, we still
need to find out rows that exist but have no values for the selected columns. Long story short,
given how the storage engine works, this means we need to query full (CQL) rows even when
only some of the columns are selected because that's the only way to distinguish between "the
row exists but have no value for the selected columns" and "the row doesn't exist". I'll note
in particular that, due to CASSANDRA-5762, we can't unfortunately rely on the row marker to
optimize that out.
> Now, when you selects only a subsets of the columns of a row, there is many cases where
you don't care about rows that exists but have no value for the columns you requested and
are happy to filter those out. So, for those cases, we could provided a new SELECT filter.
Outside the potential convenience (not having to filter empty results client side), one interesting
part is that when this filter is provided, we could optimize a bit by only querying the columns
selected, since we wouldn't need to return rows that exists but have no values for the selected
> For the exact syntax, there is probably a bunch of options. For instance:
> * {{SELECT NON EMPTY(v2, v3) FROM test}}: the vague rational for putting it in the SELECT
part is that such filter is kind of in the spirit to DISTINCT.  Possibly a bit ugly outside
of that.
> * {{SELECT v2, v3 FROM test NO EMPTY RESULTS}} or {{SELECT v2, v3 FROM test NO EMPTY
ROWS}} or {{SELECT v2, v3 FROM test NO EMPTY}}: the last one is shorter but maybe a bit less
explicit. As for {{RESULTS}} versus {{ROWS}}, the only small object to {{NO EMPTY ROWS}} could
be that it might suggest it is filtering non existing rows (I mean, the fact we never ever
return non existing rows should hint that it's not what it does but well...) while we're just
filtering empty "resultSet rows".
> Of course, if there is a pre-existing SQL syntax for that, it's even better, though a
very quick search didn't turn anything. Other suggestions welcome too.

This message was sent by Atlassian JIRA

View raw message