hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tarandeep Singh <tarand...@gmail.com>
Subject Re: How to retrieve the reducer output file names?
Date Sat, 12 Sep 2009 21:20:18 GMT
The output of mappers is partitioned, each partition is given a number
starting from 0 and a reducer works on one of these partitions. In the
configure method of your reducer code, you can get the partition number by-
jobConf.getInt( "mapred.task.partition", 0);

If you use the default output format, then the reducer working on partition
0 will output part-00000, reducer working on partition 1 will output
part-00001 etc.

You can extend TextOutputFormat or SequenceFileOutputFormat (depending upon
which output format you are using) and change the file name from part-xxxxx
to some one else.

Hope this helps,
Tarandeep


On Sat, Sep 12, 2009 at 1:39 PM, Richard G <gladiatorcn@hotmail.com> wrote:

>
> Hi,
>
> For my application, I need to retrieve the output file name for each
> reducer. But is there any convenient way to do that? I also want to know
> which file is coming from which reducer. So simple enumeration in output
> directory doesn't work for me.
>
> Thank you!
> --
> View this message in context:
> http://www.nabble.com/How-to-retrieve-the-reducer-output-file-names--tp25418039p25418039.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message