Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: error (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <520E4E87.4010001@gmail.com>
References: <039D88CC-8E05-4FA0-AB81-011FF0DBB417@nordsc.com>
	<520E4E87.4010001@gmail.com>
Date: Fri, 16 Aug 2013 12:16:28 -0400
Message-ID: 
 <CA+DN8p9cCvJKUA=gS3kRz3SJbq-W=p0JfYWuPzQtM4q72c6N=Q@mail.gmail.com>
Subject: Re: token(), limit and wide rows
From: Jonathan Rhone <jonathan@shareablee.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=bcaec548a7afbd4ad604e412e9bf

--bcaec548a7afbd4ad604e412e9bf
Content-Type: text/plain; charset=ISO-8859-1

Read

http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive

And look at

http://fossies.org/dox/apache-cassandra-1.2.8-src/CqlPagingRecordReader_8java_source.html

- Jon

On Fri, Aug 16, 2013 at 12:08 PM, Keith Freeman <8forty@gmail.com> wrote:

> I've run into the same problem, surprised nobody's responded to you.  Any
> time someone asks "how do I page through all the rows of a table in CQL3?",
> the standard answer is token() and limit.  But as you point out, this
> method will often miss some data from wide rows.
>
> Maybe a Cassandra expert will chime in if we're wrong.
>
> Your suggestion is possible if you know how to find the previous value of
> 'name' field (and are willing to filter out repeated rows), but wouldn't
> that be difficult/impossible with some keys?  So then, is there a way to do
> paging queries that get ALL of the rows, even in wide rows?
>
>
>
> On 08/13/2013 02:46 PM, Jan Algermissen wrote:
>
>> HI,
>>
>> ok, so I found token() [1], and that it is an option for paging through
>> randomly partitioned data.
>>
>> I take it that combining token() and LIMIT is the CQL3 idiom for paging
>> (set aside the fact that one shouldn't raelly want to page and use C*)
>>
>> Now, when I page through a CF with wide rows, limitting each 'page' to,
>> for example, 100 I end up in situations where not all 'sub'rows that have
>> the same result for token() are returned because LIMIT chops off the result
>> after 100 'sub'rows, not neccessarily at the boundary to the next wide row.
>>
>> Obvious ... but inconvenient.
>>
>> The solution would be to throw away the last token returned (because it's
>> wide row could have been chopped off) and do the next query with the token
>> before.
>>
>> So instead of doing
>>
>>       SELECT * FROM users WHERE token(name) > token(last-name-of-prev-**result)
>> LIMIT 100;
>>
>> I'd be doing
>>
>>      SELECT * FROM users WHERE token(name) >
>> token(one-befoe-the-last-name-**of-prev-result) LIMIT 100;
>>
>>
>> Question: Is that what I have to do or is there a way to make token() and
>> limit work together to return complete wide rows?
>>
>>
>> Jan
>>
>>
>>
>> [1] token() and how it relates to paging is actually quite hard to grasp
>> from the docs.
>>
>
>

--bcaec548a7afbd4ad604e412e9bf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Read=A0<div><br></div><div><a href=3D"http://www.datastax.com/dev/blog/cql3=
-table-support-in-hadoop-pig-and-hive">http://www.datastax.com/dev/blog/cql=
3-table-support-in-hadoop-pig-and-hive</a><div><br></div><div>And look at=
=A0</div>
<div><br></div><div><a href=3D"http://fossies.org/dox/apache-cassandra-1.2.=
8-src/CqlPagingRecordReader_8java_source.html">http://fossies.org/dox/apach=
e-cassandra-1.2.8-src/CqlPagingRecordReader_8java_source.html</a></div><div=
>
<br></div><div>- Jon<br><br><div class=3D"gmail_quote">On Fri, Aug 16, 2013=
 at 12:08 PM, Keith Freeman <span dir=3D"ltr">&lt;<a href=3D"mailto:8forty@=
gmail.com" target=3D"_blank">8forty@gmail.com</a>&gt;</span> wrote:<br><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #c=
cc solid;padding-left:1ex">
I&#39;ve run into the same problem, surprised nobody&#39;s responded to you=
. =A0Any time someone asks &quot;how do I page through all the rows of a ta=
ble in CQL3?&quot;, the standard answer is token() and limit. =A0But as you=
 point out, this method will often miss some data from wide rows.<br>

<br>
Maybe a Cassandra expert will chime in if we&#39;re wrong.<br>
<br>
Your suggestion is possible if you know how to find the previous value of &=
#39;name&#39; field (and are willing to filter out repeated rows), but woul=
dn&#39;t that be difficult/impossible with some keys? =A0So then, is there =
a way to do paging queries that get ALL of the rows, even in wide rows?<div=
 class=3D"HOEnZb">
<div class=3D"h5"><br>
<br>
<br>
On 08/13/2013 02:46 PM, Jan Algermissen wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
HI,<br>
<br>
ok, so I found token() [1], and that it is an option for paging through ran=
domly partitioned data.<br>
<br>
I take it that combining token() and LIMIT is the CQL3 idiom for paging (se=
t aside the fact that one shouldn&#39;t raelly want to page and use C*)<br>
<br>
Now, when I page through a CF with wide rows, limitting each &#39;page&#39;=
 to, for example, 100 I end up in situations where not all &#39;sub&#39;row=
s that have the same result for token() are returned because LIMIT chops of=
f the result after 100 &#39;sub&#39;rows, not neccessarily at the boundary =
to the next wide row.<br>

<br>
Obvious ... but inconvenient.<br>
<br>
The solution would be to throw away the last token returned (because it&#39=
;s wide row could have been chopped off) and do the next query with the tok=
en before.<br>
<br>
So instead of doing<br>
<br>
=A0 =A0 =A0 SELECT * FROM users WHERE token(name) &gt; token(last-name-of-p=
rev-<u></u>result) LIMIT 100;<br>
<br>
I&#39;d be doing<br>
<br>
=A0 =A0 =A0SELECT * FROM users WHERE token(name) &gt; token(one-befoe-the-l=
ast-name-<u></u>of-prev-result) LIMIT 100;<br>
<br>
<br>
Question: Is that what I have to do or is there a way to make token() and l=
imit work together to return complete wide rows?<br>
<br>
<br>
Jan<br>
<br>
<br>
<br>
[1] token() and how it relates to paging is actually quite hard to grasp fr=
om the docs.<br>
</blockquote>
<br>
</div></div></blockquote></div><br></div></div>

--bcaec548a7afbd4ad604e412e9bf--