incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Low <rich...@wentnet.com>
Subject Re: clarification of token() in CQL3
Date Tue, 06 Aug 2013 14:18:53 GMT
On 6 August 2013 15:12, Keith Freeman <8forty@gmail.com> wrote:

>  I've seen in several places the advice to use queries like to this page
> through lots of rows:
>
>
> select id from mytable where token(id) > token(last_id)
>
>
> But it's hard to find detailed information about how this works (at least
> that I can understand -- the description in the Cassandra manual is pretty
> brief).
>
> One thing I'd like to know is if new rows are always guaranteed to have
> token(new_id) > token(ids-of-all-previous-rows)?  E.g. if I have one
> process that adds rows to a table, and another that processes rows from the
> table, can the "processor" save the id of the last row processed and when
> he wakes up use:
>
> select * from mytable where token(id) > token(last_processed_id)
>
>
> to process only new rows?  Will this always work to get only new rows?
>

No, unfortunately not.  The tokens are generated by the partitioner - they
are the hash of the row key.  New tokens could be anywhere in the range of
tokens so you can't use token ordering to find new rows.

The query you suggest works to page through all the data in your column
family.  Rows will be returned regardless of when they were added (as long
as they were added before the query started).  Finding rows that have been
added since a certain time is hard in Cassandra since they are stored in
token order.  In general you have to read through all the data and work out
from e.g. a date field if they should be treated as new.

Richard.

Mime
View raw message