cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Stevens <migh...@gmail.com>
Subject Re: Pagination support on Java Driver Query API
Date Thu, 12 Feb 2015 14:50:21 GMT
Your page state then needs to track the last ck1 and last ck2 you saw.
Pages 2+ will end up needing to be up to two queries if the first query
doesn't fill the page size.

CREATE TABLE foo (
  partitionkey int,
  ck1 int,
  ck2 int,
  col1 int,
  col2 int,
  PRIMARY KEY ((partitionkey), ck1, ck2)
) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);

INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);

If you're pulling the whole of partition 1 and your page size is 2, your
first page looks like:

*PAGE 1*

SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
 partitionkey | ck1 | ck2 | col1 | col2
--------------+-----+-----+------+------
            1 |   1 |   3 |    3 |    3
            1 |   1 |   2 |    2 |    2

You got enough rows to satisfy the page, Your page state is taken from the
last row: (ck1=1, ck2=2)


*PAGE 2*
Notice that you have a page state, and add some limiting clauses on the
statement:

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2;
 partitionkey | ck1 | ck2 | col1 | col2
--------------+-----+-----+------+------
            1 |   1 |   1 |    1 |    1

Oops, we didn't get enough rows to satisfy the page limit, so we need to
continue on, we just need one more:

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1;
 partitionkey | ck1 | ck2 | col1 | col2
--------------+-----+-----+------+------
            1 |   2 |   3 |    6 |    6

We have enough to satisfy page 2 now, our new page state: (ck1 = 2, ck2 =
3).


*PAGE 3*

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 3 LIMIT 2;
 partitionkey | ck1 | ck2 | col1 | col2
--------------+-----+-----+------+------
            1 |   2 |   2 |    5 |    5
            1 |   2 |   1 |    4 |    4

Great, we satisfied this page with only one query, page state: (ck1 = 2,
ck2 = 1).


*PAGE 4*

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 1 LIMIT 2;
(0 rows)

Oops, our initial query was on the boundary of ck1, but this looks like any
other time that the initial query returns < pageSize rows, we just move on
to the next page:

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 2 LIMIT 2;
(0 rows)

Aha, we've exhausted ck1 as well, so there are no more pages, page 3
actually pulled the last possible value; page 4 is empty, and we're all
done.  Generally speaking you know you're done when your first clustering
key is the only non-equality operator in the statement, and you got no rows
back.






On Wed, Feb 11, 2015 at 10:55 AM, Ajay <ajay.garga@gmail.com> wrote:

> Basically I am trying different queries with your approach.
>
> One such query is like
>
> Select * from mycf where condition on partition key order by ck1 asc, ck2
> desc where ck1 and ck2 are clustering keys in that order.
>
> Here how do we achieve pagination support?
>
> Thanks
> Ajay
> On Feb 11, 2015 11:16 PM, "Ajay" <ajay.garga@gmail.com> wrote:
>
>>
>> Hi Eric,
>>
>> Thanks for your reply.
>>
>> I am using Cassandra 2.0.11 and in that I cannot append condition like
>> last clustering key column > value of the last row in the previous batch.
>> It fails Preceding column is either not restricted or by a non-EQ relation.
>> It means I need to specify equal  condition for all preceding clustering
>> key columns. With this I cannot get the pagination correct.
>>
>> Thanks
>> Ajay
>> > I can't believe that everyone read & process all rows at once (without
>> pagination).
>>
>> Probably not too many people try to read all rows in a table as a single
>> rolling operation with a standard client driver.  But those who do would
>> use token() to keep track of where they are and be able to resume with that
>> as well.
>>
>> But it sounds like you're talking about paginating a subset of data -
>> larger than you want to process as a unit, but prefiltered by some other
>> criteria which prevents you from being able to rely on token().  For this
>> there is no general purpose solution, but it typically involves you
>> maintaining your own paging state, typically keeping track of the last
>> partitioning and clustering key seen, and using that to construct your next
>> query.
>>
>> For example, we have client queries which can span several partitioning
>> keys.  We make sure that the List of partition keys generated by a given
>> client query List(Pq) is deterministic, then our paging state is the
>> index offset of the final Pq in the response, plus the value of the
>> final clustering column.  A query coming in with a paging state attached to
>> it starts the next set of queries from the provided Pq offset where
>> clusteringKey > the provided value.
>>
>> So if you can just track partition key offset (if spanning multiple
>> partitions), and clustering key offset, you can construct your next query
>> from those instead.
>>
>> On Tue, Feb 10, 2015 at 6:58 PM, Ajay <ajay.garga@gmail.com> wrote:
>>
>>> Thanks Alex.
>>>
>>> But is there any workaround possible?. I can't believe that everyone
>>> read & process all rows at once (without pagination).
>>>
>>> Thanks
>>> Ajay
>>> On Feb 10, 2015 11:46 PM, "Alex Popescu" <alexp@datastax.com> wrote:
>>>
>>>>
>>>> On Tue, Feb 10, 2015 at 4:59 AM, Ajay <ajay.garga@gmail.com> wrote:
>>>>
>>>>> 1) Java driver implicitly support Pagination in the ResultSet (using
>>>>> Iterator) which can be controlled through FetchSize. But it is limited
in a
>>>>> way that we cannot skip or go previous. The FetchState is not exposed.
>>>>
>>>>
>>>> Cassandra doesn't support skipping so this is not really a limitation
>>>> of the driver.
>>>>
>>>>
>>>> --
>>>>
>>>> [:>-a)
>>>>
>>>> Alex Popescu
>>>> Sen. Product Manager @ DataStax
>>>> @al3xandru
>>>>
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to java-driver-user+unsubscribe@lists.datastax.com.
>>>>
>>>
>>

Mime
View raw message