incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: Multiple input column families in Cassandra Hadoop mapreduce
Date Fri, 15 Jul 2011 22:35:16 GMT
+1 - We do a lot of this with Pig - joining over several column families.  Pig makes it just
work.  I think Hive does something similar.  Unless you really need that much control over
your process, I would really use one of those two.

On Jul 15, 2011, at 5:28 PM, Jonathan Ellis wrote:

> The easy answer is "use something like Pig or Hive that does these
> joins for you under the hood."
> 
> Not actually sure what the hard answer is. :)
> 
> On Fri, Jul 15, 2011 at 1:34 AM, Markus Mock <markus.mock@gmail.com> wrote:
>> Hello,
>> with org.apache.cassandra.hadoop.ConfigHelper.setInputColumnFamily I can set
>> up the map phase to read from one column family. Is it possible to have
>> multiple mapper classes each mapping over their own column family so that
>> data from multiple column families can be "joined" in the reduce phase? I
>> didn't find any documentation on how to do that.
>> One workaround I see is to do several MRs write the data from the different
>> column families in a single helper column family and then do the desired
>> computation but I am trying to avoid that if possible. Any suggestions on
>> how to do this without running multiple MRs and instead read from multiple
>> column families in one go?
>> Thanks.
>>   -- Markus
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com


Mime
View raw message