From user-return-27784-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Tue Jul 24 02:14:26 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 652E2D311 for ; Tue, 24 Jul 2012 02:14:26 +0000 (UTC) Received: (qmail 41506 invoked by uid 500); 24 Jul 2012 02:14:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 41486 invoked by uid 500); 24 Jul 2012 02:14:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 41476 invoked by uid 99); 24 Jul 2012 02:14:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2012 02:14:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of analog.sony@gmail.com designates 209.85.160.44 as permitted sender) Received: from [209.85.160.44] (HELO mail-pb0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Jul 2012 02:14:16 +0000 Received: by pbcwy7 with SMTP id wy7so12174347pbc.31 for ; Mon, 23 Jul 2012 19:13:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=llWU0zOf3oy+uwYE/B19VAI3hM8hB/yykfGIJy8tioo=; b=p8V77FqvJHmsuF/9bJaT9FTklzUdh3XZSx4e0/w3zbu4VqD/6ydqb0ep5f28tPmrPx I2F+khG2JyVHy3A0Ej2zWMLMTyneY/epqYPFzw1QypHQpisP4d3t3Uw6Ivvx9eVKn3wn Ss+bS960CbIDzaMQEKT0blVlzpV9nXnV+Rz1a5SbHCqY7eLKGrZ15kM4NQjb9f1482f8 pgqQX3mJ5BGCwuyp30fM8aY2x4mop6rfRfz5W54zF5xo+LVuZqw0aqbMAQ+jGa1C6Uqc L0iA2NTLR2SgQXEYHuM0weXX1jr2QBxv47RDcJ9MSmeWGWjd784uXsLttDQhITxxjxXQ ZTig== MIME-Version: 1.0 Received: by 10.68.236.4 with SMTP id uq4mr40144003pbc.158.1343096034851; Mon, 23 Jul 2012 19:13:54 -0700 (PDT) Received: by 10.68.23.3 with HTTP; Mon, 23 Jul 2012 19:13:54 -0700 (PDT) Date: Mon, 23 Jul 2012 19:13:54 -0700 Message-ID: Subject: Validate if data fetched from cassandra to MapReduce job is local to that node. From: Anandha L Ranganathan To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b33ca660e7ab404c589eac1 --047d7b33ca660e7ab404c589eac1 Content-Type: text/plain; charset=ISO-8859-1 My Cassandra setup is like this. RF is set to 3 and Strategy is SimpleStrategy. Cluster Size: 5 Nodes. Map Reduce Job. 1) All my TaskTracker's are running in the same node as cassandra is running. 2) I wrote simple MR job to retrieve data from cassandra to MR. 3) I have no problem in working with that and it works fine. Before migrating to production, we want to validate the data retrieved from cassandra is local to that node. When the Task is created by JT, it should create a mapper in the same location(node) as data is located. How do I validate if the data retrieved is local to that node. Here is the code I had written to find the token of the column name. long tsForToken = getColKeyForTime(context.getConfiguration(),temp); String pToken = partitioner.getTokenFactory().toString(partitioner.getToken(ByteBufferUtil.bytes(tsForToken))); If the pToken is between startToken and EndToken for that node than it is local to that node. But my RF is 3, it may not be the case. While storing the data with RF >1 , if the pToken < initial_token then it will store the data in that node. One way, how I can validate my data locality test is to pass pToken and get all the nodes storing that data. public List SimpleStrategy.calculateNaturalEndpoints(Token token, TokenMetadata metadata) I have difficulty in getting the instance of SimpleStrategy and TokenMetaData in mapper at runtime. Can someone help me on this issue?. -Anand --047d7b33ca660e7ab404c589eac1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable My Cassandra setup is like this.
=A0 RF is set to 3 and Strategy is Simp= leStrategy.
=A0 Cluster Size: 5 Nodes.

Map Reduce Job.

1)= All my TaskTracker's=A0 are running in the same node as cassandra is r= unning.
2) I wrote simple MR job to retrieve data from cassandra to MR.
3) I have no problem in working with that and it works fine.

Before = migrating to production, we want to validate the data retrieved from cassan= dra is local to that node.
When the Task is created by JT, it should cr= eate a mapper in the same location(node) as data is located.
How do I validate if the data retrieved is local to that node.

Here = is the code I had written to find the token of the column name.

=A0= =A0=A0=A0 =A0=A0=A0 =A0 long tsForToken =3D getColKeyForTime(context.getCon= figuration(),temp);=A0
=A0=A0=A0=A0 =A0=A0=A0 =A0 String=A0 pToken =3D partitioner.getTokenFactory= ().toString(partitioner.getToken(ByteBufferUtil.bytes(tsForToken)));
If the pToken is between startToken and EndToken for that node than it is= local to that node.
But my RF is 3, it may not be the case.

While storing the data with= RF >1 , if the pToken < initial_token then it will store the data in= that node.

One way, how I can validate my data locality test is to= pass pToken and get all the nodes storing that data.=A0

=A0=A0=A0 public List<InetAddress>=A0=A0=A0 SimpleStrategy.calcul= ateNaturalEndpoints(Token token, TokenMetadata metadata)
=A0=A0=A0
= =A0=A0=A0 I have difficulty in getting the instance of SimpleStrategy and T= okenMetaData in mapper at runtime.
=A0=A0=A0
=A0=A0=A0 Can someone help me on this issue?.

-Anand --047d7b33ca660e7ab404c589eac1--