cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: Multiple input column families in Cassandra Hadoop mapreduce
Date Fri, 15 Jul 2011 22:28:36 GMT
The easy answer is "use something like Pig or Hive that does these
joins for you under the hood."

Not actually sure what the hard answer is. :)

On Fri, Jul 15, 2011 at 1:34 AM, Markus Mock <> wrote:
> Hello,
> with org.apache.cassandra.hadoop.ConfigHelper.setInputColumnFamily I can set
> up the map phase to read from one column family. Is it possible to have
> multiple mapper classes each mapping over their own column family so that
> data from multiple column families can be "joined" in the reduce phase? I
> didn't find any documentation on how to do that.
> One workaround I see is to do several MRs write the data from the different
> column families in a single helper column family and then do the desired
> computation but I am trying to avoid that if possible. Any suggestions on
> how to do this without running multiple MRs and instead read from multiple
> column families in one go?
> Thanks.
>   -- Markus

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support

View raw message