Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of meatforums@gmail.com designates
 209.85.216.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=kR/XgDYFSVoInX+vvg0+Eq0lHeJoSB0ttcexSJG/mx6PCuqO+HLVsKdNQFcvjAGTDF
         Uulpt/jn9zz9TythWHIGWnxskA0v0wwWM1yFalqNaSsitCrdKMUVEHtExtZTZkBWbfMY
         P2NPbsHwnOHL/N2MNGptgfJEHDEIfmgOPd6RQ=
MIME-Version: 1.0
Date: Wed, 10 Nov 2010 07:05:48 -0800
Message-ID: <AANLkTikK+wiY=BV+dcK6RoPsKUUp6eF30ba7K1gq-n3q@mail.gmail.com>
Subject: Range queries using token instead of key
From: Anand Somani <meatforums@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0016363b85c443d5580494b431ad

--0016363b85c443d5580494b431ad
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I am trying to iterate over the entire dataset to calculate some
information. Now the way I am trying to do this is by going directly to the
node that has a data range, so here is the route I am following

   - get TokenRange using - describe_ring
   - then for each tokenRange pick a node and get all data from that node
   (so talk directly to that node for local data) - using get_range_slices ()
   and using KeyRange with start and end token. I want to get about N tokens at
   a time.
   - I want to use paging approach for this, but I cannot seem to find a way
   to get the token for my last keyslice? The only thing I can find is key, now
   is there way to get token given a key? As per some suggestions I can do the
   md5 on the last key and use that as the starting token for the next query,
   would that work?

Also is there a better way of doing this? The data per row is very small.
This looks like a hadoop kind of a job, but am trying to avoid hadoop since
have no other use for it and this operation will be infrequent.

I am using 0.6.6, RandomPartitioner.

Thanks
Anand

--0016363b85c443d5580494b431ad
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,<br><br>I am trying to iterate over the entire dataset to calculate some=
 information. Now the way I am trying to do this is by going directly to th=
e node that has a data range, so here is the route I am following<br><ul>

<li>get TokenRange using - describe_ring</li><li>then for each tokenRange p=
ick a node and get all data from that node (so talk directly to that node f=
or local data) - using get_range_slices () and using KeyRange with start an=
d end token. I want to get about N tokens at a time.<br>

</li><li>I want to use paging approach for this, but I cannot seem to find =
a way to get the token for my last keyslice? The only thing I can find is k=
ey, now is there way to get token given a key? As per some suggestions I ca=
n do the md5 on the last key and use that as the starting token for the nex=
t query, would that work?</li>
</ul>Also is there a better way of doing this? The data per row is very sma=
ll. This looks like a hadoop kind of a job, but am trying to avoid hadoop s=
ince have no other use for it and this operation will be infrequent.<br>
<br>I am using 0.6.6, RandomPartitioner.<br>
<br>Thanks<br>Anand<br>

--0016363b85c443d5580494b431ad--