You can use the output of describe_ring along with partitioner information to determine which nodes data lives on.


On Fri, Mar 29, 2013 at 12:33 PM, Alicia Leong <lccalicia@gmail.com> wrote:
Hi All

I’m thinking to do in this way.

1)      1) get_slice ( YYYYMMDDHH )  from Index Table.

2)      2) With the returned list of ROWKEYs

3)      3) Pass it to multiget_slice ( keys …)

 

But my questions is how to ensure ‘Data Locality’  ??



On Tue, Mar 19, 2013 at 3:33 PM, aaron morton <aaron@thelastpickle.com> wrote:
I would be looking at Hive or Pig, rather than writing the MapReduce. 

There is an example in the source cassandra distribution, or you can look at Data Stax Enterprise to start playing with Hive. 

Typically with hadoop queries you want to query a lot of data, if you are only querying a few rows consider writing the code in your favourite language. 

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton

On 18/03/2013, at 1:29 PM, Alicia Leong <lccalicia@gmail.com> wrote:

Hi All

I have 2 tables


Data Table
-----------------
RowKey: 1
=> (column=name, value=apple)
RowKey: 2
=> (column=name, value=orange)
RowKey: 3
=> (column=name, value=banana)
RowKey: 4
=> (column=name, value=mango)


Index Table (YYYYMMDDHH)
------------------------------------------------
RowKey: 2013030114
=> (column=1, value=)
=> (column=2, value=)
=> (column=3, value=)
RowKey: 2013030115
=> (column=4, value=)


I would like to know, how to implement below in MapReduce
1) first query the Index Table by RowKey: 2013030114
2) then pass the Index Table column names  (1,2,3) to query the Data Table

Thanks in advance.