I haven't used that in particular, but it's pretty trivial to do that with Pig and I would
imagine it would just do the right thing under the covers. It's a simple join with Pig.
We use pygmalion to get data from the Cassandra bag. A simple example would be:
DEFINE FromCassandraBag org.pygmalion.udf.FromCassandraBag();
raw_billing_acount = LOAD 'cassandra://voltron/billing_account' USING org.apache.cassandra.hadoop.pig.CassandraStorage()
AS (id:chararray, columns:bag {column:tuple (name, value)});
billing_account = FOREACH raw_billing_account GENERATE
id,
FLATTEN(FromCassandraBag('name, age, address, city, state, zip',columns)) AS (
name: chararray,
age: chararray,
address: chararray,
city: chararray,
state: chararray,
zip: chararay
);
raw_game_account = LOAD 'cassandra://voltron/game_account' USING org.apache.cassandra.hadoop.pig.CassandraStorage()
AS (id:chararray, columns:bag {column:tuple (name, value)});
game_account = FOREACH raw_game_account GENERATE
id,
FLATTEN(FromCassandraBag('username, level, experience_points, super_powers, vehicles',columns))
AS (
username: chararray,
level: chararray,
experience_points: chararray,
super_powers: chararray,
vehicles: chararray
);
composite_relation = FOREACH
(join billing_account by id, game_account by id)
GENERATE
billing_account::id as id,
name,
username,
level,
super_powers;
Anyway - not sure if that's what you're looking for but that's what we do a lot of with Pig
- joins on any attribute or group bys or things like that.
On Mar 1, 2012, at 4:45 AM, Benoit Mathieu wrote:
> Hi all,
>
> I want to write a MapReduce job with a Map task taking its data from 2
> CFs. Those 2 CFs have the same row keys and are in same keyspace, so
> they are partionned the same way across my cluster and it would be
> nice that the Map task reads the both column families locally.
>
> In hadoop package org.apache.hadoop.mapred.join, there is a
> CompositeInputFormat class, which seems to do what I want, but it
> seems related to HDFS files as the "compose" method takes "Path" args.
>
> Does anyone have ever wrote a CompositeColumnFamilyInputFormat ? or
> have any insight about it ?
>
> Cheers,
>
> Benoit
|