hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mice <mice1...@gmail.com>
Subject Re: hdfs output for both mapper and reducer
Date Fri, 26 Sep 2008 02:39:24 GMT
I think you can try org.apache.hadoop.mapred.lib.MultipleOutputs, it
will be released in 0.19 but you can apply the patch now.

Just my idea, not sure it's efficient or not

2008/9/25 Christian Ulrik S√łttrup <soettrup@nbi.dk>:
> Hi all,
> I am interested in saving the the output of both the mapper and the reducer
> in HDFS, is there an efficient way of doing this?
> Of course i could just run the mapper followed by the identity reducer, and
> then an identity mapper with my reducer. However,
> it seems like a waste to run the framework twice. Is the sort between the
> mapper and reducer efficient if it recieves already sorted data?
> cheers,
> Christian

View raw message