cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <>
Subject Re: Multiple input column families in Cassandra Hadoop mapreduce
Date Fri, 15 Jul 2011 22:35:16 GMT
+1 - We do a lot of this with Pig - joining over several column families.  Pig makes it just
work.  I think Hive does something similar.  Unless you really need that much control over
your process, I would really use one of those two.

On Jul 15, 2011, at 5:28 PM, Jonathan Ellis wrote:

> The easy answer is "use something like Pig or Hive that does these
> joins for you under the hood."
> Not actually sure what the hard answer is. :)
> On Fri, Jul 15, 2011 at 1:34 AM, Markus Mock <> wrote:
>> Hello,
>> with org.apache.cassandra.hadoop.ConfigHelper.setInputColumnFamily I can set
>> up the map phase to read from one column family. Is it possible to have
>> multiple mapper classes each mapping over their own column family so that
>> data from multiple column families can be "joined" in the reduce phase? I
>> didn't find any documentation on how to do that.
>> One workaround I see is to do several MRs write the data from the different
>> column families in a single helper column family and then do the desired
>> computation but I am trying to avoid that if possible. Any suggestions on
>> how to do this without running multiple MRs and instead read from multiple
>> column families in one go?
>> Thanks.
>>   -- Markus
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support

View raw message