hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GOEKE, MATTHEW (AG/1000)" <matthew.go...@monsanto.com>
Subject RE: Poor scalability with map reduce application
Date Tue, 21 Jun 2011 17:16:44 GMT

Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role
in this? The only benefit he would find in tweaking that for his problem would be to spread
network traffic from the shuffle over a longer period of time at a cost of having the reducer
using resources earlier. Either way he would see this effect across both sets of runs if he
is using the default parameters. I guess it would all depend on what kind of network layout
the cluster is on.


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Tuesday, June 21, 2011 12:09 PM
To: common-user@hadoop.apache.org
Subject: Re: Poor scalability with map reduce application


On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti
<albertoandreotti@gmail.com> wrote:
> I don't know if speculatives maps are on, I'll check it. One thing I
> observed is that reduces begin before all maps have finished. Let me check
> also if the difference is on the map side or in the reduce. I believe it's
> balanced, both are slower when adding more nodes, but i'll confirm that.

Maps and reduces are speculative by default, so must've been ON. Could
you also post a general input vs. output record counts and statistics
like that between your job runs, to correlate?

The reducers get scheduled early but do not exactly "reduce()" until
all maps are done. They just keep fetching outputs. Their scheduling
can be controlled with some configurations (say, to start only after
X% of maps are done -- by default it starts up when 5% of maps are

Harsh J
This e-mail message may contain privileged and/or confidential information, and is intended
to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the
sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail
by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival
by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence
of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such
code transmitted by or accompanying
this e-mail or any attachment.

The information contained in this email may be subject to the export control laws and regulations
of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations
issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you
are obligated to comply with all
applicable U.S. export laws and regulations.

View raw message