cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Freeman <>
Subject Re: clarification of token() in CQL3
Date Tue, 06 Aug 2013 15:56:02 GMT
Ok, I get that, I'll have to find another way to sort out new rows.

Your description makes me think that if new rows are added during the 
paging (i.e. between one select with token()'s and another), they might 
show up in the query results, right?  (because the hash of the new row 
keys might fall sequentially after token(last_processed_row))

On 08/06/2013 08:18 AM, Richard Low wrote:
> On 6 August 2013 15:12, Keith Freeman < 
> <>> wrote:
>     I've seen in several places the advice to use queries like to this
>     page through lots of rows:
>>     select id from mytable where token(id) > token(last_id)
>     But it's hard to find detailed information about how this works
>     (at least that I can understand -- the description in the
>     Cassandra manual is pretty brief).
>     One thing I'd like to know is if new rows are always guaranteed to
>     have token(new_id) > token(ids-of-all-previous-rows)?  E.g. if I
>     have one process that adds rows to a table, and another that
>     processes rows from the table, can the "processor" save the id of
>     the last row processed and when he wakes up use:
>         select * from mytable where token(id) > token(last_processed_id)
>     to process only new rows?  Will this always work to get only new rows?
> No, unfortunately not.  The tokens are generated by the partitioner - 
> they are the hash of the row key.  New tokens could be anywhere in the 
> range of tokens so you can't use token ordering to find new rows.
> The query you suggest works to page through all the data in your 
> column family.  Rows will be returned regardless of when they were 
> added (as long as they were added before the query started).  Finding 
> rows that have been added since a certain time is hard in Cassandra 
> since they are stored in token order.  In general you have to read 
> through all the data and work out from e.g. a date field if they 
> should be treated as new.
> Richard.

View raw message