My Cassandra setup is like this.
RF is set to 3 and Strategy is SimpleStrategy.
Cluster Size: 5 Nodes.
Map Reduce Job.
1) All my TaskTracker's are running in the same node as cassandra is running.
2) I wrote simple MR job to retrieve data from cassandra to MR.
3) I have no problem in working with that and it works fine.
Before migrating to production, we want to validate the data retrieved from cassandra is local to that node.
When the Task is created by JT, it should create a mapper in the same location(node) as data is located.
How do I validate if the data retrieved is local to that node.
Here is the code I had written to find the token of the column name.
long tsForToken = getColKeyForTime(context.getConfiguration(),temp);
String pToken = partitioner.getTokenFactory().toString(partitioner.getToken(ByteBufferUtil.bytes(tsForToken)));
If the pToken is between startToken and EndToken for that node than it is local to that node.
But my RF is 3, it may not be the case.
While storing the data with RF >1 , if the pToken < initial_token then it will store the data in that node.
One way, how I can validate my data locality test is to pass pToken and get all the nodes storing that data.
public List<InetAddress> SimpleStrategy.calculateNaturalEndpoints(Token token, TokenMetadata metadata)
I have difficulty in getting the instance of SimpleStrategy and TokenMetaData in mapper at runtime.
Can someone help me on this issue?.