incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: How can I get rows in groups?
Date Mon, 22 Nov 2010 09:38:25 GMT
If you are working inside the cassandra code base, take a look at o.a.c.hadoop.ColumnFamilyRecordReader.
It reads all the rows in a CF using tokens. I'm not sure that code cares too much about reading
a row twice. AFAIK using tokens for is considered an internal feature.

WRT the start key / end key issue, why not take a look at how the pycassa, phpcassa or hector
libraries do it? 

Aaron


On 22 Nov 2010, at 22:10, altanis@ceid.upatras.gr wrote:

> I am not using any client, I am trying to extend Cassandra with a new API
> call so that a _node_ will do that on behalf of clients. Thank you for the
> answer, but it doesn't answer my question!
> 
> Alexander
> 
>> Most of the high level clients do this for you.
>> 
>> For example, pycassa and phpcassa both do this by returning an
>> iterator from get_range() and breaking it up behind the scenes.
>> 
>> Hector also has something similar, but I think it's in the examples
>> section.
>> 
>> What client are you using?
>> 
>> (By the way, beta1 is old and buggy! You should switch to beta3.)
>> 
>> - Tyler
>> 
>> On Fri, Nov 19, 2010 at 8:33 AM, <altanis@ceid.upatras.gr> wrote:
>> 
>>> Hello,
>>> 
>>> I would like one of the cluster's nodes to use get_range_slices() to
>>> retrieve the values of a specific column for the entire keyspace. I
>>> obviously don't want to do it for the whole keyspace at once, so I'd
>>> like
>>> to do it in groups of n, which should be configurable.
>>> 
>>> I get the first n values using a KeyRange with the current node's local
>>> token as start_token and end_token, which equals the whole keyspace.
>>> 
>>> After that, it makes sense to have a loop, and to use each time a new
>>> KeyRange with the largest key returned by the previous iteration as the
>>> start_key. However, I don't know what to use as end_key, and Cassandra
>>> complains that if one of (start_key, end_key) is not null, the other
>>> can't
>>> be either. What can I do?
>>> 
>>> Can I use tokens? I read that a KeyRange with tokens is end-inclusive,
>>> and
>>> can wrap, so I can just give the local node's token as the end_token all
>>> the time, so when the traversing reaches that node again, it will know
>>> the
>>> whole keyspace was traversed. Or are tokens different semantically?
>>> 
>>> I am using Cassandra 0.7.0 beta1, and the OrderPreservingPartitioner.
>>> 
>>> Alexander Altanis
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 


Mime
View raw message