Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8A69F9089 for ; Tue, 14 May 2013 18:42:56 +0000 (UTC) Received: (qmail 98696 invoked by uid 500); 14 May 2013 18:42:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 98646 invoked by uid 500); 14 May 2013 18:42:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 98638 invoked by uid 99); 14 May 2013 18:42:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 May 2013 18:42:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dmcnelis@gmail.com designates 209.85.215.53 as permitted sender) Received: from [209.85.215.53] (HELO mail-la0-f53.google.com) (209.85.215.53) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 May 2013 18:42:48 +0000 Received: by mail-la0-f53.google.com with SMTP id eo20so896672lab.40 for ; Tue, 14 May 2013 11:42:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=9bymzKAHBzJEJqyBKa2KuAGdf59mDSHGBQwPixiQb/s=; b=Jbx4tSboR50hGydkOs+vUgJsLDlhoF+R6s66+I6ZMvoAc8xJPbLlUcZWaAqvMnrcVI 0X/zR72+N8KaVNjxP4dAk6ey4zlRSTz6vtjX6nhSTsBr+eePsCq9zl8arjm+219HyxdP tPKAhROvPRWiRYjwSJDLGl4HSEGL+BlByVSj7lqTzHhtZh/ukr+8dbzpS32C63XylqVW JwBMuGtccV/g3HDlHougB1UP1oR2XpgGztgILfiP79piiOiXRud9UKyHuQ2RsjiGh3+8 9E1uWSwj42EZ9XYeWGJtqEClVcFlc6Z50woQ3vRQf23XAgYppv8dLeWM9UbGN1z7WWaT Poww== MIME-Version: 1.0 X-Received: by 10.152.26.225 with SMTP id o1mr16357971lag.43.1368556947346; Tue, 14 May 2013 11:42:27 -0700 (PDT) Received: by 10.112.173.136 with HTTP; Tue, 14 May 2013 11:42:27 -0700 (PDT) In-Reply-To: References: Date: Tue, 14 May 2013 13:42:27 -0500 Message-ID: Subject: Re: Iterating through large numbers of rows with JDBC From: David McNelis To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e0158c738b39c7004dcb1fe70 X-Virus-Checked: Checked by ClamAV on apache.org --089e0158c738b39c7004dcb1fe70 Content-Type: text/plain; charset=ISO-8859-1 Another thing to keep in mind when doing this with CQL is to take into account the ordering partitioner you may or may not be using. If you're using one you'll need to make sure that if you have a larger number of rows for the partitioner key than your query limit, then you can end up in a situation where you're stuck in a loop. On Tue, May 14, 2013 at 1:39 PM, aaron morton wrote: > You can iterate over them, just make sure to set a sensible row count to > chunk things up. > See > http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging > > You can also break up the processing so only one worker reads the token > ranges for a node. That allows you to > process the rows in parallel and avoid workers processing the same rows. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 13/05/2013, at 2:51 AM, Robert Wille wrote: > > Iterating through lots of records is not a primary use of my data. > However, there are a number scenarios where scanning the entire contents > of a column family is an interesting and useful exercise. Here are a few: > removal of orphaned records, checking the integrity a data set, and > analytics. > > On 5/12/13 3:41 AM, "Oleg Dulin" wrote: > > On 2013-05-11 14:42:32 +0000, Robert Wille said: > > I'm using the JDBC driver to access Cassandra. I'm wondering if its > possible to iterate through a large number of records (e.g. to perform > maintenance on a large column family). I tried calling > Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY, > ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that > cursors aren't supported. Is there another way to do this, or do I need > to > use a different API? > > Thanks in advance > > Robert > > > If you feel that you need to iterate through a large number of rows > then you are probably not using a correct data model. > > Can you describe your use case ? > > -- > Regards, > Oleg Dulin > NYC Java Big Data Engineer > http://www.olegdulin.com/ > > > > > > --089e0158c738b39c7004dcb1fe70 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Another thing to keep in mind when doing this with CQL is = to take into account the ordering partitioner you may or may not be using. = =A0If you're using one you'll need to make sure that if you have a = larger number of rows for the partitioner key than your query limit, then y= ou can end up in a situation where you're stuck in a loop.


On Tue, May 1= 4, 2013 at 1:39 PM, aaron morton <aaron@thelastpickle.com> wrote:
You can = iterate over them, just make sure to set a sensible row count to chunk thin= gs up.

<= /div>
You can also break up the processing so only one worker reads the toke= n ranges for a node. That allows you to=A0
process the rows in pa= rallel and avoid workers processing the same rows.=A0

<= div> Cheers
=A0
-----------------
Aaron Morton
Freelance Cassandra= Consultant
New Zealand


On 13/05/2013, at 2:51 AM, Robert Wille <rwille@footnote.com> wrote:<= /div>
Iterating through lots of records is not= a primary use of my data.
However, there are a number scenarios where scanning the entire contentsof a column family is an interesting and useful exercise. Here are a few:<= br>removal of orphaned records, checking the integrity a data set, and
analytics.

On 5/12/13 3:41 AM, "Oleg Dulin" <oleg.dulin@gmail.com>= ; wrote:

On 2013-05-11 14:42:32 +0000, Rob= ert Wille said:

I'm using the JDBC driver to access Cassa= ndra. I'm wondering if its
possible to iterate through a large numbe= r of records (e.g. to perform
maintenance on a large column family). I t= ried calling
Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR= _READ_ONLY), but it times out, so I'm guessing that
cursors aren'= ;t supported. Is there another way to do this, or do I need
to
use a = different API?

Thanks in advance

Robert

If you feel that yo= u need to iterate through a large number of rows
then you are probably n= ot using a correct data model.

Can you describe your use case ?

--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/<= /a>






--089e0158c738b39c7004dcb1fe70--