Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 74429 invoked from network); 24 Mar 2011 23:01:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Mar 2011 23:01:07 -0000 Received: (qmail 10984 invoked by uid 500); 24 Mar 2011 23:01:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 10942 invoked by uid 500); 24 Mar 2011 23:01:04 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 10902 invoked by uid 99); 24 Mar 2011 23:01:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Mar 2011 23:01:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jeremy.hanna1234@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Mar 2011 23:00:54 +0000 Received: by yxk30 with SMTP id 30so254920yxk.31 for ; Thu, 24 Mar 2011 16:00:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id:references:to :x-mailer; bh=gYIbFPj4SJvMxS5FFzqv+306o75yfjFVfXkZxe+ug/Y=; b=S8tJtRvsbScABuIQbm7AmSTQdtW6RYSuxfdxFIh0ohvDRjYff86cGAolpw144T6ZS6 S0lkzKW4vhFW1BBN2oktwYtfI8j9FrmjeJpLLlQxJjzepVSwjvGY/CYRjHwHefVfDauH A2F+MK2WI5mBEcayb/NGKTU/qcE24CZouC1oY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; b=RLm77lBAIhgiR9/jiy1rxouJmpytoUWOgn5+PfyoTgaQ/TwMvHmo1STLXYvnpal7XK 9Bp94ceFkvhDyCuGawlm3gqNwyxMQnfKnHPVwWCzbejF5lBMmJNqBMHH5GO4jQWespP2 +Ho+SdyDUj7AqfEhPnMsrJspyh9OS3EYo9G/c= Received: by 10.151.40.16 with SMTP id s16mr104831ybj.123.1301007633732; Thu, 24 Mar 2011 16:00:33 -0700 (PDT) Received: from [192.168.1.46] (rrcs-50-84-97-66.sw.biz.rr.com [50.84.97.66]) by mx.google.com with ESMTPS id w1sm272393ybl.9.2011.03.24.16.00.32 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 24 Mar 2011 16:00:33 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: pig counting question From: Jeremy Hanna In-Reply-To: <237AC220-2D6D-4078-9297-39F487E310AA@gmail.com> Date: Thu, 24 Mar 2011 18:00:31 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <5FAFCB6230A1754CBC58296E62D8D5E602E8FBDB86@pa-ex-01.YOJOE.local> <5FAFCB6230A1754CBC58296E62D8D5E602E8FBDD3D@pa-ex-01.YOJOE.local> <237AC220-2D6D-4078-9297-39F487E310AA@gmail.com> To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1084) And if you download the 0.7 branch and build the cassandra_storage.jar = in the contrib/pig section with that update, you should be able to use = it with your 0.7.3 cluster. Those changes are typically independent of = the Cassandra version. On Mar 24, 2011, at 5:49 PM, Jeremy Hanna wrote: > Hmmm, for wide rows, you can page it with I believe some changes on = 0.7 branch that made it in as part of = https://issues.apache.org/jira/browse/CASSANDRA-1618 recently. = Specifically, using the 0.7 branch version of CassandraStorage, you can = specify it using this basic template: > = cassandra:///[?slice_start=3D&slice_end=3D<= end>[&reversed=3Dtrue][&limit=3D1]] > That goes in your pig LOAD block. > So it's a pain to do what you're doing I would imagine but it's = possible to page in the latest on 0.7 branch. >=20 > On Mar 24, 2011, at 3:57 PM, Jeffrey Wang wrote: >=20 >> It looks like this functionality is not in the 0.7.3 version of = CassandraStorage. I tried to add the constructor which takes the limit = to the class, but I ran into some Pig parsing errors, so I had to make = the parameter a string. How did you get around this for the version of = CassandraStorage in trunk? I'm running Pig 0.8.0. >>=20 >> Also, when I bump the limit up very high (e.g. 1M columns), my = Cassandra starts eating up huge amounts of memory, maxing out my 16GB = heap size. I suspect this is because of the get_range_slices() call from = ColumnFamilyRecordReader. Are there plans to make this streaming/paged? >>=20 >> -Jeffrey >>=20 >> -----Original Message----- >> From: Jeremy Hanna [mailto:jeremy.hanna1234@gmail.com]=20 >> Sent: Thursday, March 24, 2011 11:34 AM >> To: user@cassandra.apache.org >> Subject: Re: pig counting question >>=20 >> The limit defaults to 1024 but you can set it when you use = CassandraStorage in pig, like so: >> rows =3D LOAD 'cassandra://Keyspace/ColumnFamily' USING = CassandraStorage(4096); >> or whatever value you wish. >>=20 >> Give that a try and see if it gives you more of what you're looking = for. >>=20 >> On Mar 24, 2011, at 1:16 PM, Jeffrey Wang wrote: >>=20 >>> Hey all, >>>=20 >>> I'm trying to run a very simple Pig script against my Cassandra = cluster (5 nodes, 0.7.3). I've gotten it all set up and working, but the = script is giving me some strange results. Here is my script: >>>=20 >>> rows =3D LOAD 'cassandra://Keyspace/ColumnFamily' USING = CassandraStorage(); >>> rowct =3D FOREACH rows GENERATE $0, COUNT($1); >>> dump rowct; >>>=20 >>> If I understand Pig correctly, this should output (row name, column = count) tuples, but I'm always seeing 1024 for the column count even = though the rows have highly variable number of columns. Am I missing = something? Thanks. >>>=20 >>> -Jeffrey >>>=20 >>=20 >=20