This is the current flow for ColumnFamilyInputFormat.  Please correct me If I'm wrong

1) In ColumnFamilyInputFormat, Get all nodes token ranges using client.describe_ring
2) Get CfSplit using client.describe_splits_ex with the token range
2) new ColumnFamilySplit with start range, end range and endpoint
3) In ColumnFamilyRecordReader, will query client.get_range_slices with the start range & end range of the ColumnFamilySplit at endpoint (datanode)

If I would use client.get_slice ( key).  My rowkey is '20130314'  from Index Table.
Q1) How to know for rowkey '20130314' is in which Token Range & EndPoint.
Even though I manage to find out the Token Range & EndPoint.  
Is the available Thrift API, that I can pass the ( ByteBuffer key, KeyRange range )  Likes merge of client.get_slice & client.get_range_slices


On Sat, Mar 30, 2013 at 7:53 AM, Edward Capriolo <> wrote:
You can use the output of describe_ring along with partitioner information to determine which nodes data lives on.

On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong <> wrote:
Hi All

I’m thinking to do in this way.

1)      1) get_slice ( YYYYMMDDHH )  from Index Table.

2)      2) With the returned list of ROWKEYs

3)      3) Pass it to multiget_slice ( keys …)


But my questions is how to ensure ‘Data Locality’  ??

On Tue, Mar 19, 2013 at 3:33 PM, aaron morton <> wrote:
I would be looking at Hive or Pig, rather than writing the MapReduce. 

There is an example in the source cassandra distribution, or you can look at Data Stax Enterprise to start playing with Hive. 

Typically with hadoop queries you want to query a lot of data, if you are only querying a few rows consider writing the code in your favourite language. 

Aaron Morton
Freelance Cassandra Consultant
New Zealand


On 18/03/2013, at 1:29 PM, Alicia Leong <> wrote:

Hi All

I have 2 tables

Data Table
RowKey: 1
=> (column=name, value=apple)
RowKey: 2
=> (column=name, value=orange)
RowKey: 3
=> (column=name, value=banana)
RowKey: 4
=> (column=name, value=mango)

Index Table (YYYYMMDDHH)
RowKey: 2013030114
=> (column=1, value=)
=> (column=2, value=)
=> (column=3, value=)
RowKey: 2013030115
=> (column=4, value=)

I would like to know, how to implement below in MapReduce
1) first query the Index Table by RowKey: 2013030114
2) then pass the Index Table column names  (1,2,3) to query the Data Table

Thanks in advance.