Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D3187F36 for ; Thu, 17 Nov 2011 19:45:05 +0000 (UTC) Received: (qmail 10871 invoked by uid 500); 17 Nov 2011 19:45:03 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 10847 invoked by uid 500); 17 Nov 2011 19:45:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 10839 invoked by uid 99); 17 Nov 2011 19:45:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2011 19:45:03 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of gcdcu-cassandra-user-1@m.gmane.org designates 80.91.229.12 as permitted sender) Received: from [80.91.229.12] (HELO lo.gmane.org) (80.91.229.12) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Nov 2011 19:44:54 +0000 Received: from list by lo.gmane.org with local (Exim 4.69) (envelope-from ) id 1RR7st-0007gc-Ss for user@cassandra.apache.org; Thu, 17 Nov 2011 20:44:32 +0100 Received: from rev-66-150-171-4.rhapsody.com ([66.150.171.4]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 17 Nov 2011 20:44:31 +0100 Received: from aaron.c.griffith by rev-66-150-171-4.rhapsody.com with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 17 Nov 2011 20:44:31 +0100 X-Injected-Via-Gmane: http://gmane.org/ To: user@cassandra.apache.org From: Aaron Griffith Subject: Re: Help with Pig Script Date: Thu, 17 Nov 2011 19:44:17 +0000 (UTC) Lines: 58 Message-ID: References: <745DD529-5825-47E4-88F4-5CAAF707E3AD@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet@dough.gmane.org X-Gmane-NNTP-Posting-Host: sea.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 66.150.171.4 (Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.120 Safari/535.2) X-Virus-Checked: Checked by ClamAV on apache.org Jeremy Hanna gmail.com> writes: > > If you are only interested in loading one row, why do you need to use Pig? Is it an extremely wide row? > > Unless you are using an ordered partitioner, you can't limit the rows you mapreduce over currently - you > have to mapreduce over the whole column family. That will change probably in 1.1. However, again, if > you're only after 1 row, why don't you just use a regular cassandra client and get that row and operate on it > that way? > > I suppose you *could* use pig and filter by the ID or something. If you *do* have an ordered partitioner in > your cluster, it's just a matter of specifying the key range. > > On Nov 17, 2011, at 11:16 AM, Aaron Griffith wrote: > > > I am trying to do the following with a PIG script and am having trouble finding > > the correct syntax. > > > > - I want to use the LOAD function to load a single key/value "row" into a pig > > object. > > - The contents of that row is then flattened into a list of keys. > > - I then want to use that list of keys for another load function to select the > > key/value pairs from another column family. > > > > The only way I can get this to work is by using a generic load function then > > applying filters to get at the data I want. Then joining the two pig objects > > together to filter the second column family. > > > > I want to avoid having to pull the entire column familys into pig, it is way too > > much data. > > > > Any suggestions? > > > > Thanks! > > > > It is a very wide row, with nested keys to another column family. Pig makes it easy convert it into a list of keys. It also makes it easy to write out the results into Hadoop. I then want to take that list of keys to go get rows from whatever column family they are for. Thanks for you response.