incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tharindu Mathew <mcclou...@gmail.com>
Subject Re: Efficient way of figuring out which nodes a set of keys belong to - Hadoop integration
Date Sat, 24 Sep 2011 07:34:55 GMT
Would really appreciate any help on this.

On Thu, Sep 22, 2011 at 11:34 PM, Tharindu Mathew <mccloud35@gmail.com>wrote:

> Hi,
>
> I managed to modify the Hadoop-Cassandra integration to start with a column
> of a CF used for indexing. In the map phase, I get keys from different CFs
> and get the row I need. So this all works fine, for a single node. :)
>
> I'd like to effectively identify a set of nodes for a set of rows and get
> them efficiently into Hadoop. So my initial design was something like this.
>
> Have a new operation in the thrift interface that allows us to do,
>
> Map<(CF+key), List<endpoints>> client.get_endpoints ( List<CF+keys>)
>
> Functionality would be similar to node tools#getEndpoints.
>
> And, then when processing we can get the relevant endpoint relevant to each
> CF and key, through this without querying for node for each and every key.
> If the key is not found in the endpoint (due to node been added/ displaced
> while processing), only then we calculate the relevant end point again.
>
> I'd like to ask from the cassandra devs whether this method sounds the best
> way to do this or to point out any improvements/ flaws in the way I'm
> approaching this?
>
> Thanks in advance.
>
> --
> Regards,
>
> Tharindu
>
> blog: http://mackiemathew.com/
>
>


-- 
Regards,

Tharindu

blog: http://mackiemathew.com/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message