Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A293C10ED7 for ; Fri, 16 Aug 2013 16:17:18 +0000 (UTC) Received: (qmail 51250 invoked by uid 500); 16 Aug 2013 16:17:16 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 51122 invoked by uid 500); 16 Aug 2013 16:17:16 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 51114 invoked by uid 99); 16 Aug 2013 16:17:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Aug 2013 16:17:15 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) Received: from [209.85.212.43] (HELO mail-vb0-f43.google.com) (209.85.212.43) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Aug 2013 16:17:09 +0000 Received: by mail-vb0-f43.google.com with SMTP id h11so1728123vbh.2 for ; Fri, 16 Aug 2013 09:16:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=cVyBtnuKsxCcS5OWyV7s/ocKfOvxUwTU4lHofRsO5ls=; b=PArPQBC36CS50Rus/fhqQ1b1Hj0ciMahS4hPBJzpXGJOEsgn02we/1o56rAiYyzh1i 42nop5ulurjZRiZrFrfOCn404jikpTPjUGsFqHmhWxe2LEEdqSLUd3ANOSCUc248WrzZ 51Ccslorx4JMkDjhFgw1UJnRwFpMa4DLt/b4seuGX6/KZT7/XwvjP6Yxo9zpLASiVoU4 ghr8i8YnJXXKTIri/jYLkT2dZjheKOa0hxeCUJvxcm1vLCG7nZVsxchEPx6/jKF6+Smm sceRmY7Ij5uJptogA4NwQepYIvXB7JPkbjI7VgEewGoyYxZrpJSXuvrYnGdgpFtKu59w 9ijQ== X-Gm-Message-State: ALoCoQl5LCF1a1D4dSUD9+f6iJRy4Ig+aunHjjy35E1TU+o/9tm4Z3Rn1spFggC8BKKqcRBjMA37 MIME-Version: 1.0 X-Received: by 10.52.187.162 with SMTP id ft2mr1571544vdc.10.1376669788877; Fri, 16 Aug 2013 09:16:28 -0700 (PDT) Received: by 10.58.178.140 with HTTP; Fri, 16 Aug 2013 09:16:28 -0700 (PDT) In-Reply-To: <520E4E87.4010001@gmail.com> References: <039D88CC-8E05-4FA0-AB81-011FF0DBB417@nordsc.com> <520E4E87.4010001@gmail.com> Date: Fri, 16 Aug 2013 12:16:28 -0400 Message-ID: Subject: Re: token(), limit and wide rows From: Jonathan Rhone To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec548a7afbd4ad604e412e9bf X-Virus-Checked: Checked by ClamAV on apache.org --bcaec548a7afbd4ad604e412e9bf Content-Type: text/plain; charset=ISO-8859-1 Read http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive And look at http://fossies.org/dox/apache-cassandra-1.2.8-src/CqlPagingRecordReader_8java_source.html - Jon On Fri, Aug 16, 2013 at 12:08 PM, Keith Freeman <8forty@gmail.com> wrote: > I've run into the same problem, surprised nobody's responded to you. Any > time someone asks "how do I page through all the rows of a table in CQL3?", > the standard answer is token() and limit. But as you point out, this > method will often miss some data from wide rows. > > Maybe a Cassandra expert will chime in if we're wrong. > > Your suggestion is possible if you know how to find the previous value of > 'name' field (and are willing to filter out repeated rows), but wouldn't > that be difficult/impossible with some keys? So then, is there a way to do > paging queries that get ALL of the rows, even in wide rows? > > > > On 08/13/2013 02:46 PM, Jan Algermissen wrote: > >> HI, >> >> ok, so I found token() [1], and that it is an option for paging through >> randomly partitioned data. >> >> I take it that combining token() and LIMIT is the CQL3 idiom for paging >> (set aside the fact that one shouldn't raelly want to page and use C*) >> >> Now, when I page through a CF with wide rows, limitting each 'page' to, >> for example, 100 I end up in situations where not all 'sub'rows that have >> the same result for token() are returned because LIMIT chops off the result >> after 100 'sub'rows, not neccessarily at the boundary to the next wide row. >> >> Obvious ... but inconvenient. >> >> The solution would be to throw away the last token returned (because it's >> wide row could have been chopped off) and do the next query with the token >> before. >> >> So instead of doing >> >> SELECT * FROM users WHERE token(name) > token(last-name-of-prev-**result) >> LIMIT 100; >> >> I'd be doing >> >> SELECT * FROM users WHERE token(name) > >> token(one-befoe-the-last-name-**of-prev-result) LIMIT 100; >> >> >> Question: Is that what I have to do or is there a way to make token() and >> limit work together to return complete wide rows? >> >> >> Jan >> >> >> >> [1] token() and how it relates to paging is actually quite hard to grasp >> from the docs. >> > > --bcaec548a7afbd4ad604e412e9bf Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Read=A0

- Jon

On Fri, Aug 16, 2013= at 12:08 PM, Keith Freeman <8forty@gmail.com> wrote:
I've run into the same problem, surprised nobody's responded to you= . =A0Any time someone asks "how do I page through all the rows of a ta= ble in CQL3?", the standard answer is token() and limit. =A0But as you= point out, this method will often miss some data from wide rows.

Maybe a Cassandra expert will chime in if we're wrong.

Your suggestion is possible if you know how to find the previous value of &= #39;name' field (and are willing to filter out repeated rows), but woul= dn't that be difficult/impossible with some keys? =A0So then, is there = a way to do paging queries that get ALL of the rows, even in wide rows?



On 08/13/2013 02:46 PM, Jan Algermissen wrote:
HI,

ok, so I found token() [1], and that it is an option for paging through ran= domly partitioned data.

I take it that combining token() and LIMIT is the CQL3 idiom for paging (se= t aside the fact that one shouldn't raelly want to page and use C*)

Now, when I page through a CF with wide rows, limitting each 'page'= to, for example, 100 I end up in situations where not all 'sub'row= s that have the same result for token() are returned because LIMIT chops of= f the result after 100 'sub'rows, not neccessarily at the boundary = to the next wide row.

Obvious ... but inconvenient.

The solution would be to throw away the last token returned (because it'= ;s wide row could have been chopped off) and do the next query with the tok= en before.

So instead of doing

=A0 =A0 =A0 SELECT * FROM users WHERE token(name) > token(last-name-of-p= rev-result) LIMIT 100;

I'd be doing

=A0 =A0 =A0SELECT * FROM users WHERE token(name) > token(one-befoe-the-l= ast-name-of-prev-result) LIMIT 100;


Question: Is that what I have to do or is there a way to make token() and l= imit work together to return complete wide rows?


Jan



[1] token() and how it relates to paging is actually quite hard to grasp fr= om the docs.


--bcaec548a7afbd4ad604e412e9bf--