hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Twensky" <jim.twen...@gmail.com>
Subject Re: Merging reducer outputs into a single part-00000 file
Date Thu, 15 Jan 2009 00:33:55 GMT
Owen and Rasit,

Thank you for the responses. I've figured that mapred.reduce.tasks was set
to 1 in my hadoop-default xml and I didn't overwrite it in my
hadoop-site.xml configuration file.

Jim

On Wed, Jan 14, 2009 at 11:23 AM, Owen O'Malley <omalley@apache.org> wrote:

> On Jan 14, 2009, at 12:46 AM, Rasit OZDAS wrote:
>
>  Jim,
>>
>> As far as I know, there is no operation done after Reducer.
>>
>
> Correct, other than output promotion, which moves the output file to the
> final filename.
>
>  But if you  are a little experienced, you already know these.
>> Ordered list means one final file, or am I missing something?
>>
>
> There is no value and a lot of cost associated with creating a single file
> for the output. The question is how you want the keys divided between the
> reduces (and therefore output files). The default partitioner hashes the key
> and mods by the number of reduces, which "stripes" the keys across the
> output files. You can use the mapred.lib.InputSampler to generate good
> partition keys and mapred.lib.TotalOrderPartitioner to get completely sorted
> output based on the partition keys. With the total order partitioner, each
> reduce gets an increasing range of keys and thus has all of the nice
> properties of a single file without the costs.
>
> -- Owen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message