Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of dmcnelis@gmail.com designates
 209.85.215.53 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <D0B8A82A-4F9C-423D-AABE-5E12B4F9E6DA@thelastpickle.com>
References: <CDB505FC.9ECE5%rwille@footnote.com>
	<D0B8A82A-4F9C-423D-AABE-5E12B4F9E6DA@thelastpickle.com>
Date: Tue, 14 May 2013 13:42:27 -0500
Message-ID: 
 <CACy0uxmEQMMDSHPdOwVAyQ0F_qOQ0OdDGEGEFRL447TjUURtnA@mail.gmail.com>
Subject: Re: Iterating through large numbers of rows with JDBC
From: David McNelis <dmcnelis@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e0158c738b39c7004dcb1fe70

--089e0158c738b39c7004dcb1fe70
Content-Type: text/plain; charset=ISO-8859-1

Another thing to keep in mind when doing this with CQL is to take into
account the ordering partitioner you may or may not be using.  If you're
using one you'll need to make sure that if you have a larger number of rows
for the partitioner key than your query limit, then you can end up in a
situation where you're stuck in a loop.


On Tue, May 14, 2013 at 1:39 PM, aaron morton <aaron@thelastpickle.com>wrote:

> You can iterate over them, just make sure to set a sensible row count to
> chunk things up.
> See
> http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging
>
> You can also break up the processing so only one worker reads the token
> ranges for a node. That allows you to
> process the rows in parallel and avoid workers processing the same rows.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/05/2013, at 2:51 AM, Robert Wille <rwille@footnote.com> wrote:
>
> Iterating through lots of records is not a primary use of my data.
> However, there are a number scenarios where scanning the entire contents
> of a column family is an interesting and useful exercise. Here are a few:
> removal of orphaned records, checking the integrity a data set, and
> analytics.
>
> On 5/12/13 3:41 AM, "Oleg Dulin" <oleg.dulin@gmail.com> wrote:
>
> On 2013-05-11 14:42:32 +0000, Robert Wille said:
>
> I'm using the JDBC driver to access Cassandra. I'm wondering if its
> possible to iterate through a large number of records (e.g. to perform
> maintenance on a large column family). I tried calling
> Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
> ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
> cursors aren't supported. Is there another way to do this, or do I need
> to
> use a different API?
>
> Thanks in advance
>
> Robert
>
>
> If you feel that you need to iterate through a large number of rows
> then you are probably not using a correct data model.
>
> Can you describe your use case ?
>
> --
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/
>
>
>
>
>
>

--089e0158c738b39c7004dcb1fe70
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Another thing to keep in mind when doing this with CQL is =
to take into account the ordering partitioner you may or may not be using. =
=A0If you&#39;re using one you&#39;ll need to make sure that if you have a =
larger number of rows for the partitioner key than your query limit, then y=
ou can end up in a situation where you&#39;re stuck in a loop.</div>
<div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Tue, May 1=
4, 2013 at 1:39 PM, aaron morton <span dir=3D"ltr">&lt;<a href=3D"mailto:aa=
ron@thelastpickle.com" target=3D"_blank">aaron@thelastpickle.com</a>&gt;</s=
pan> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word">You can =
iterate over them, just make sure to set a sensible row count to chunk thin=
gs up.<div>
See=A0<a href=3D"http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-=
ordered-partitioner-paging" target=3D"_blank">http://www.datastax.com/docs/=
1.2/cql_cli/using/paging#non-ordered-partitioner-paging</a></div><div><br><=
/div>
<div>You can also break up the processing so only one worker reads the toke=
n ranges for a node. That allows you to=A0</div><div>process the rows in pa=
rallel and avoid workers processing the same rows.=A0</div><div><br></div><=
div>
Cheers</div><div>=A0<br><div>
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">
<div style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;tex=
t-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norma=
l;text-transform:none;font-size:medium;white-space:normal;font-family:Helve=
tica;word-wrap:break-word;word-spacing:0px">
<span style=3D"border-collapse:separate;border-spacing:0px"><div style=3D"w=
ord-wrap:break-word"><span style=3D"border-spacing:0px;text-indent:0px;lett=
er-spacing:normal;font-variant:normal;font-style:normal;font-weight:normal;=
line-height:normal;border-collapse:separate;text-transform:none;font-size:m=
edium;white-space:normal;font-family:Helvetica;word-spacing:0px"><div style=
=3D"word-wrap:break-word">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">
<span style=3D"border-spacing:0px;text-indent:0px;letter-spacing:normal;fon=
t-variant:normal;font-style:normal;font-weight:normal;line-height:normal;bo=
rder-collapse:separate;text-transform:none;font-size:medium;white-space:nor=
mal;font-family:Helvetica;word-spacing:0px"><div style=3D"word-wrap:break-w=
ord">
<div>-----------------</div><div>Aaron Morton</div><div>Freelance Cassandra=
 Consultant</div><div>New Zealand</div><div><br></div><div>@aaronmorton</di=
v><div><a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://ww=
w.thelastpickle.com</a></div>
</div></span></div></span></div></span></div></span></div></div>
</div><div><div class=3D"h5">
<br><div><div>On 13/05/2013, at 2:51 AM, Robert Wille &lt;<a href=3D"mailto=
:rwille@footnote.com" target=3D"_blank">rwille@footnote.com</a>&gt; wrote:<=
/div><br><blockquote type=3D"cite">Iterating through lots of records is not=
 a primary use of my data.<br>
However, there are a number scenarios where scanning the entire contents<br=
>of a column family is an interesting and useful exercise. Here are a few:<=
br>removal of orphaned records, checking the integrity a data set, and<br>
analytics.<br><br>On 5/12/13 3:41 AM, &quot;Oleg Dulin&quot; &lt;<a href=3D=
"mailto:oleg.dulin@gmail.com" target=3D"_blank">oleg.dulin@gmail.com</a>&gt=
; wrote:<br><br><blockquote type=3D"cite">On 2013-05-11 14:42:32 +0000, Rob=
ert Wille said:<br>
<br><blockquote type=3D"cite">I&#39;m using the JDBC driver to access Cassa=
ndra. I&#39;m wondering if its<br>possible to iterate through a large numbe=
r of records (e.g. to perform<br>maintenance on a large column family). I t=
ried calling<br>
Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,<br>ResultSet.CONCUR=
_READ_ONLY), but it times out, so I&#39;m guessing that<br>cursors aren&#39=
;t supported. Is there another way to do this, or do I need<br>to<br>use a =
different API?<br>
<br>Thanks in advance<br><br>Robert<br></blockquote><br>If you feel that yo=
u need to iterate through a large number of rows<br>then you are probably n=
ot using a correct data model.<br><br>Can you describe your use case ?<br>
<br>-- <br>Regards,<br>Oleg Dulin<br>NYC Java Big Data Engineer<br><a href=
=3D"http://www.olegdulin.com/" target=3D"_blank">http://www.olegdulin.com/<=
/a><br><br><br></blockquote><br><br></blockquote></div><br></div></div></di=
v>
</div></blockquote></div><br></div>

--089e0158c738b39c7004dcb1fe70--