hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Porter <gmpor...@gmail.com>
Subject Re: How to speed up the copy phrase?
Date Fri, 28 Aug 2009 00:12:10 GMT
Interesting.  In this case, how does Jetty dole out the proper
partitions of the intermediate data to the appropriate reducers if
they are located in the same files?


On Thu, Aug 27, 2009 at 11:31 AM, Arun C Murthy<acm@yahoo-inc.com> wrote:
> On Aug 24, 2009, at 5:49 PM, Aaron Kimball wrote:
>> If you've got 20 nodes, then you want to have 20-ish reduce tasks. Maybe
>> 40
>> if you want it to run in two waves. (Assuming 1 core/node. Multiply by N
>> for
>> N cores...) As it is, each node has 500-ish map tasks that it has to read
>> from and for each of these, it needs to generate 500 separate reduce task
>> output files.  That's going to take Hadoop a long time to do.
> Maps do not produce one output file per reduce, the entire map-output is in
> a single file.
> Arun

View raw message