cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <>
Subject Re: Help with Pig Script
Date Thu, 17 Nov 2011 21:45:54 GMT

On Nov 17, 2011, at 1:44 PM, Aaron Griffith wrote:

> Jeremy Hanna <jeremy.hanna1234 <at>> writes:
>> If you are only interested in loading one row, why do you need to use Pig?  Is 
> it an extremely wide row?
>> Unless you are using an ordered partitioner, you can't limit the rows you 
> mapreduce over currently - you
>> have to mapreduce over the whole column family.  That will change probably in 
> 1.1.  However, again, if
>> you're only after 1 row, why don't you just use a regular cassandra client and 
> get that row and operate on it
>> that way?
>> I suppose you *could* use pig and filter by the ID or something.  If you *do* 
> have an ordered partitioner in
>> your cluster, it's just a matter of specifying the key range.
>> On Nov 17, 2011, at 11:16 AM, Aaron Griffith wrote:
>>> I am trying to do the following with a PIG script and am having trouble 
> finding 
>>> the correct syntax.
>>> - I want to use the LOAD function to load a single key/value "row" into a 
> pig 
>>> object.
>>> - The contents of that row is then flattened into a list of keys.
>>> - I then want to use that list of keys for another load function to select 
> the 
>>> key/value pairs from another column family.
>>> The only way I can get this to work is by using a generic load function then

>>> applying filters to get at the data I want. Then joining the two pig objects

>>> together to filter the second column family.
>>> I want to avoid having to pull the entire column familys into pig, it is way

> too 
>>> much data.
>>> Any suggestions?
>>> Thanks!
> It is a very wide row, with nested keys to another column family.  Pig makes it 
> easy convert it into a list of keys.
> It also makes it easy to write out the results into Hadoop.
> I then want to take that list of keys to go get rows from whatever column family 
> they are for.
> Thanks for you response.

Okay.  Makes sense.  There is work being done to support wide rows with mapreduce -
which is now being worked on as part of transposition -
 Transposition would make it so each wide row would turn into several transposed rows - (key,
column, value) combinations.

I think the easiest way to do what you're trying to do is to use a client to page through
the row and get the whole thing, then you can copy that up to hdfs or whatever else you want
to do with it.
View raw message