incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rhone <jonat...@shareablee.com>
Subject Re: token(), limit and wide rows
Date Fri, 16 Aug 2013 16:16:28 GMT
Read

http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive

And look at

http://fossies.org/dox/apache-cassandra-1.2.8-src/CqlPagingRecordReader_8java_source.html

- Jon

On Fri, Aug 16, 2013 at 12:08 PM, Keith Freeman <8forty@gmail.com> wrote:

> I've run into the same problem, surprised nobody's responded to you.  Any
> time someone asks "how do I page through all the rows of a table in CQL3?",
> the standard answer is token() and limit.  But as you point out, this
> method will often miss some data from wide rows.
>
> Maybe a Cassandra expert will chime in if we're wrong.
>
> Your suggestion is possible if you know how to find the previous value of
> 'name' field (and are willing to filter out repeated rows), but wouldn't
> that be difficult/impossible with some keys?  So then, is there a way to do
> paging queries that get ALL of the rows, even in wide rows?
>
>
>
> On 08/13/2013 02:46 PM, Jan Algermissen wrote:
>
>> HI,
>>
>> ok, so I found token() [1], and that it is an option for paging through
>> randomly partitioned data.
>>
>> I take it that combining token() and LIMIT is the CQL3 idiom for paging
>> (set aside the fact that one shouldn't raelly want to page and use C*)
>>
>> Now, when I page through a CF with wide rows, limitting each 'page' to,
>> for example, 100 I end up in situations where not all 'sub'rows that have
>> the same result for token() are returned because LIMIT chops off the result
>> after 100 'sub'rows, not neccessarily at the boundary to the next wide row.
>>
>> Obvious ... but inconvenient.
>>
>> The solution would be to throw away the last token returned (because it's
>> wide row could have been chopped off) and do the next query with the token
>> before.
>>
>> So instead of doing
>>
>>       SELECT * FROM users WHERE token(name) > token(last-name-of-prev-**result)
>> LIMIT 100;
>>
>> I'd be doing
>>
>>      SELECT * FROM users WHERE token(name) >
>> token(one-befoe-the-last-name-**of-prev-result) LIMIT 100;
>>
>>
>> Question: Is that what I have to do or is there a way to make token() and
>> limit work together to return complete wide rows?
>>
>>
>> Jan
>>
>>
>>
>> [1] token() and how it relates to paging is actually quite hard to grasp
>> from the docs.
>>
>
>

Mime
View raw message