incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anandha L Ranganathan <analog.s...@gmail.com>
Subject Validate if data fetched from cassandra to MapReduce job is local to that node.
Date Tue, 24 Jul 2012 02:13:54 GMT
My Cassandra setup is like this.
  RF is set to 3 and Strategy is SimpleStrategy.
  Cluster Size: 5 Nodes.

Map Reduce Job.

1) All my TaskTracker's  are running in the same node as cassandra is
running.
2) I wrote simple MR job to retrieve data from cassandra to MR.
3) I have no problem in working with that and it works fine.

Before migrating to production, we want to validate the data retrieved from
cassandra is local to that node.
When the Task is created by JT, it should create a mapper in the same
location(node) as data is located.
How do I validate if the data retrieved is local to that node.

Here is the code I had written to find the token of the column name.

           long tsForToken =
getColKeyForTime(context.getConfiguration(),temp);
           String  pToken =
partitioner.getTokenFactory().toString(partitioner.getToken(ByteBufferUtil.bytes(tsForToken)));


If the pToken is between startToken and EndToken for that node than it is
local to that node.
But my RF is 3, it may not be the case.

While storing the data with RF >1 , if the pToken < initial_token then it
will store the data in that node.

One way, how I can validate my data locality test is to pass pToken and get
all the nodes storing that data.

    public List<InetAddress>
SimpleStrategy.calculateNaturalEndpoints(Token token, TokenMetadata
metadata)

    I have difficulty in getting the instance of SimpleStrategy and
TokenMetaData in mapper at runtime.

    Can someone help me on this issue?.

-Anand

Mime
View raw message