Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 67065 invoked from network); 10 Nov 2010 15:05:49 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Nov 2010 15:05:49 -0000 Received: (qmail 5104 invoked by uid 500); 10 Nov 2010 15:06:17 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 5085 invoked by uid 500); 10 Nov 2010 15:06:17 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 5077 invoked by uid 99); 10 Nov 2010 15:06:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Nov 2010 15:06:17 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of meatforums@gmail.com designates 209.85.216.44 as permitted sender) Received: from [209.85.216.44] (HELO mail-qw0-f44.google.com) (209.85.216.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Nov 2010 15:06:09 +0000 Received: by qwb7 with SMTP id 7so738906qwb.31 for ; Wed, 10 Nov 2010 07:05:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=LMRC7yjAIJow6qoM70C0m326p+no44CcenCyAlgzXkI=; b=GAoLz5MwgiZMN4z//ngf4mW8ZHgoo3HroqMlAVtVXP5jIAV8XFkg3SY6RNCvO3hfec 2ejtZEvkuUHZ9vSEB8GRgrLjUEgwolZuPtncD/dI71a2uPAtDfQ1k2HXOnZhIYvV4Eip ZEFISrQW7HjuzB3DLcDUvBNVVMvOrRCpQwc1Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=kR/XgDYFSVoInX+vvg0+Eq0lHeJoSB0ttcexSJG/mx6PCuqO+HLVsKdNQFcvjAGTDF Uulpt/jn9zz9TythWHIGWnxskA0v0wwWM1yFalqNaSsitCrdKMUVEHtExtZTZkBWbfMY P2NPbsHwnOHL/N2MNGptgfJEHDEIfmgOPd6RQ= MIME-Version: 1.0 Received: by 10.229.224.201 with SMTP id ip9mr7843409qcb.16.1289401548393; Wed, 10 Nov 2010 07:05:48 -0800 (PST) Received: by 10.229.191.15 with HTTP; Wed, 10 Nov 2010 07:05:48 -0800 (PST) Date: Wed, 10 Nov 2010 07:05:48 -0800 Message-ID: Subject: Range queries using token instead of key From: Anand Somani To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016363b85c443d5580494b431ad X-Virus-Checked: Checked by ClamAV on apache.org --0016363b85c443d5580494b431ad Content-Type: text/plain; charset=ISO-8859-1 Hi, I am trying to iterate over the entire dataset to calculate some information. Now the way I am trying to do this is by going directly to the node that has a data range, so here is the route I am following - get TokenRange using - describe_ring - then for each tokenRange pick a node and get all data from that node (so talk directly to that node for local data) - using get_range_slices () and using KeyRange with start and end token. I want to get about N tokens at a time. - I want to use paging approach for this, but I cannot seem to find a way to get the token for my last keyslice? The only thing I can find is key, now is there way to get token given a key? As per some suggestions I can do the md5 on the last key and use that as the starting token for the next query, would that work? Also is there a better way of doing this? The data per row is very small. This looks like a hadoop kind of a job, but am trying to avoid hadoop since have no other use for it and this operation will be infrequent. I am using 0.6.6, RandomPartitioner. Thanks Anand --0016363b85c443d5580494b431ad Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

I am trying to iterate over the entire dataset to calculate some= information. Now the way I am trying to do this is by going directly to th= e node that has a data range, so here is the route I am following
  • get TokenRange using - describe_ring
  • then for each tokenRange p= ick a node and get all data from that node (so talk directly to that node f= or local data) - using get_range_slices () and using KeyRange with start an= d end token. I want to get about N tokens at a time.
  • I want to use paging approach for this, but I cannot seem to find = a way to get the token for my last keyslice? The only thing I can find is k= ey, now is there way to get token given a key? As per some suggestions I ca= n do the md5 on the last key and use that as the starting token for the nex= t query, would that work?
Also is there a better way of doing this? The data per row is very sma= ll. This looks like a hadoop kind of a job, but am trying to avoid hadoop s= ince have no other use for it and this operation will be infrequent.

I am using 0.6.6, RandomPartitioner.

Thanks
Anand
--0016363b85c443d5580494b431ad--