hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tienduc_dinh <tienduc_d...@yahoo.com>
Subject Re: Merging reducer outputs into a single part-00000 file
Date Sun, 11 Jan 2009 13:23:58 GMT

part-00000 means, there is only one reduce task in your configuration.

Hope, this helps.

Tien Duc Dinh

Jim Twensky wrote:
> Hello,
> The original map-reduce paper states: "After successful completion, the
> output of the map-reduce execution is available in the R output files (one
> per reduce task, with file names as specified by the user)." However, when
> using Hadoop's TextOutputFormat, all the reducer outputs are combined in a
> single file called part-00000. I was wondering how and when this merging
> process is done. When the reducer calls output.collect(key,value), is this
> record written to a local temporary output file in the reducer's disk and
> then these local files (a total of R) are later merged into one single
> file
> with a final thread or is it directly written to the final output file
> (part-00000)? I am asking this because I'd like to get an ordered sample
> of
> the final output data, ie. one record per every 1000 records or something
> similar and I don't want to run a serial process that iterates on the
> final
> output file.
> Thanks,
> Jim

View this message in context: http://www.nabble.com/Merging-reducer-outputs-into-a-single-part-00000-file-tp21396867p21399089.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

View raw message