cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DOAN DuyHai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-4851) CQL3: improve support for paginating over composites
Date Thu, 29 Aug 2013 09:08:52 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753447#comment-13753447
] 

DOAN DuyHai commented on CASSANDRA-4851:
----------------------------------------

Big +10 for this feature

Right now I am preparing some slides for a talk and tutorial on Cassandra to convince people
switching from Thrift to CQL3. However I am facing serious issue because of the limitation
of CQL3 not being able to allow inequality on more than 1 clustered component at a time.

 My example is quite trivial

{code:sql}
 CREATE TABLE comment_index_by_rating
 (
    songId uuid,
    rating int,  // rating is integer from 1 to 10
    date uuid, // date of the comment
    comment text, //comment message itself 
    userLogin text, //login of the user who posts the comment
    PRIMARY KEY (songId,rating,date)
 )
{code}


 I would like to paginate over comment so the first query would be

{code:sql}
 SELECT * FROM comment_index_by_rating WHERE songId = .... ORDER BY rating DESC LIMIT 10;
// fetch first 10 comments
{code}

 The following queries would be:

{code:sql}
 SELECT * FROM comment_index_by_rating WHERE songId = .... AND rating <= {rating_of_last_comment_of_previous_batch}
AND date <= {date_of_last_comment_of_previous_batch}
{code}

 Right now it is just IMPOSSIBLE to paginate like this, which is PITA.

 I know that there is already jira https://issues.apache.org/jira/browse/CASSANDRA-4415  which
is a really good idea but the issue raised above is *beyond the scope of just paging data*.

 People are using more and more compound primary keys to model with Cassandra and they should
be able to do slice queries with inequality from all compound components.

 There are lots of use cases where such usage is required

For example indexing daily metrics
{code:sql}
CREATE TABLE daily_metrics
(
  day int, // day in YYYYMMDD format
  hour int, 
  minute int,
  second int,
  metrics blob, 
  PRIMARY KEY (day, hour, minute, second)
)
{code}

 I should be able to grep all metrics from a range of date

  // select all metrics from 8:30am to 10am
 {code:sql}
 SELECT metrics FROM daily_metrics WHERE day = 20130828 AND hour >= 8 AND minute >=
30 and hour <= 10
 {code}


                
> CQL3: improve support for paginating over composites
> ----------------------------------------------------
>
>                 Key: CASSANDRA-4851
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4851
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>
> Consider the following table:
> {noformat}
> CREATE TABLE test (
>     k int,
>     c1 int,
>     c2 int,
>     PRIMARY KEY (k, c1, c2)
> )
> {noformat}
> with the following data:
> {noformat}
> k | c1 | c2
> ------------
> 0 | 0  | 0
> 0 | 0  | 1
> 0 | 1  | 0
> 0 | 1  | 1
> {noformat}
> Currently, CQL3 allows to slice over either c1 or c2:
> {noformat}
> SELECT * FROM test WHERE k = 0 AND c1 > 0 AND c1 < 2
> SELECT * FROM test WHERE k = 0 AND c1 = 1 AND c2 > 0 AND c2 < 2
> {noformat}
> but you cannot express a query that return the 3 last records. Indeed, for that you would
need to do a query like say:
> {noformat}
> SELECT * FROM test WHERE k = 0 AND ((c1 = 0 AND c2 > 0) OR c2 > 0)
> {noformat}
> but we don't support that.
> This can make it hard to paginate over say all records for {{k = 0}} (I'm saying "can"
because if the value for c2 cannot be very large, an easy workaround could be to paginate
by entire value of c1, which you can do).
> For the case where you only paginate to avoid OOMing on a query, CASSANDRA-4415 will
that and is probably the best solution. However, there may be case where the pagination is
say user (as in, the user of your application) triggered.
> I note that one solution would be to add the OR support at least in case like the one
above. That's definitively doable but on the other side, we won't be able to support full-blown
OR, so it may not be very natural that we support seemingly random combination of OR and not
others.
> Another solution would be to allow the following syntax:
> {noformat}
> SELECT * FROM test WHERE k = 0 AND (c1, c2) > (0, 0)
> {noformat}
> which would literally mean that you want records where the values of c1 and c2 taken
as a tuple is lexicographically greater than the tuple (0, 0). This is less SQL-like (though
maybe some SQL store have that, it's a fairly thing to have imo?), but would be much simpler
to implement and probably to use too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message