cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tharindu Mathew <>
Subject Re: Efficient way of figuring out which nodes a set of keys belong to - Hadoop integration
Date Sat, 24 Sep 2011 07:34:55 GMT
Would really appreciate any help on this.

On Thu, Sep 22, 2011 at 11:34 PM, Tharindu Mathew <>wrote:

> Hi,
> I managed to modify the Hadoop-Cassandra integration to start with a column
> of a CF used for indexing. In the map phase, I get keys from different CFs
> and get the row I need. So this all works fine, for a single node. :)
> I'd like to effectively identify a set of nodes for a set of rows and get
> them efficiently into Hadoop. So my initial design was something like this.
> Have a new operation in the thrift interface that allows us to do,
> Map<(CF+key), List<endpoints>> client.get_endpoints ( List<CF+keys>)
> Functionality would be similar to node tools#getEndpoints.
> And, then when processing we can get the relevant endpoint relevant to each
> CF and key, through this without querying for node for each and every key.
> If the key is not found in the endpoint (due to node been added/ displaced
> while processing), only then we calculate the relevant end point again.
> I'd like to ask from the cassandra devs whether this method sounds the best
> way to do this or to point out any improvements/ flaws in the way I'm
> approaching this?
> Thanks in advance.
> --
> Regards,
> Tharindu
> blog:




View raw message