hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Houle <ontolo...@gmail.com>
Subject Real Multiple Outputs for Hadoop -- is this implementation correct?
Date Fri, 13 Sep 2013 19:23:59 GMT
Hey guys I spent some time last week thinking about Hadoop before I wrote
my own class,  RealMultipleOutputs,  that does something like what
MultipleOutputs does,  except that you can specify different hdfs paths for
the different output streams.   My pals were telling me to use Cascading or
Pig if I want this functionality,  but otherwise I was happy writing Plain
M/R jars

I wrote up the implementation here:

https://github.com/paulhoule/infovore/wiki/Real-Multiple-Outputs-in-Hadoop

And this works hand-in hand with an abstraction layer that supports unit
testing w/ Mockito

https://github.com/paulhoule/infovore/wiki/Unit-Testing-Hadoop-Mappers-and-Reducers

Anyway,  I'd appreciate anybody looking at this code and trying to poke
holes in it.  It runs OK on my tiny dev cluster in 1.0.4,  1.1.2 and in
AMZN EMR but I am wondering if I missed something.

Mime
View raw message