cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Griffith <>
Subject Re: Help with Pig Script
Date Thu, 17 Nov 2011 19:44:17 GMT
Jeremy Hanna <jeremy.hanna1234 <at>> writes:

> If you are only interested in loading one row, why do you need to use Pig?  Is 
it an extremely wide row?
> Unless you are using an ordered partitioner, you can't limit the rows you 
mapreduce over currently - you
> have to mapreduce over the whole column family.  That will change probably in 
1.1.  However, again, if
> you're only after 1 row, why don't you just use a regular cassandra client and 
get that row and operate on it
> that way?
> I suppose you *could* use pig and filter by the ID or something.  If you *do* 
have an ordered partitioner in
> your cluster, it's just a matter of specifying the key range.
> On Nov 17, 2011, at 11:16 AM, Aaron Griffith wrote:
> > I am trying to do the following with a PIG script and am having trouble 
> > the correct syntax.
> > 
> > - I want to use the LOAD function to load a single key/value "row" into a 
> > object.
> > - The contents of that row is then flattened into a list of keys.
> > - I then want to use that list of keys for another load function to select 
> > key/value pairs from another column family.
> > 
> > The only way I can get this to work is by using a generic load function then 
> > applying filters to get at the data I want. Then joining the two pig objects 
> > together to filter the second column family.
> > 
> > I want to avoid having to pull the entire column familys into pig, it is way 
> > much data.
> > 
> > Any suggestions?
> > 
> > Thanks!
> > 

It is a very wide row, with nested keys to another column family.  Pig makes it 
easy convert it into a list of keys.

It also makes it easy to write out the results into Hadoop.

I then want to take that list of keys to go get rows from whatever column family 
they are for.

Thanks for you response.

View raw message