hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Juan P." <gordoslo...@gmail.com>
Subject Re: Performance Tunning
Date Mon, 27 Jun 2011 22:33:33 GMT
Thanks for your help!
I think I get it now, but this part is a bit confusing:
*so: tasktracker/datanode and 6 slots left. How you break it up from there
is your call but I would suggest either 4 mappers / 2 reducers or 5 mappers
/ 1 reducer.*
If it's 2 processes per core, then it's: 4 Nodes * 4 Cores/Node * 2
Processes/Core = 32 Processes Total

So my configuration mapred-site.xml should include these props:

*  <name>mapred.map.tasks</name>*
*  <value>28</value>*
*  <name>mapred.reduce.tasks</name>*
*  <value>4</value>*

Is that correct?

On Mon, Jun 27, 2011 at 4:59 PM, GOEKE, MATTHEW (AG/1000) <
matthew.goeke@monsanto.com> wrote:

> If you are running default configurations then you are only getting 2
> mappers and 1 reducer per node. The rule of thumb I have gone on (and back
> up by the definitive guide) is 2 processes per core so: tasktracker/datanode
> and 6 slots left. How you break it up from there is your call but I would
> suggest either 4 mappers / 2 reducers or 5 mappers / 1 reducer.
> Check out the below configs for details on what you are *most likely*
> running currently:
> http://hadoop.apache.org/common/docs/r0.20.2/mapred-default.html
> http://hadoop.apache.org/common/docs/r0.20.2/hdfs-default.html
> http://hadoop.apache.org/common/docs/r0.20.2/core-default.html
> HTH,
> Matt
> -----Original Message-----
> From: Juan P. [mailto:gordoslocos@gmail.com]
> Sent: Monday, June 27, 2011 2:50 PM
> To: common-user@hadoop.apache.org
> Subject: Performance Tunning
> I'm trying to run a MapReduce task against a cluster of 4 DataNodes with 4
> cores each.
> My input data is 4GB in size and it's split into 100MB files. Current
> configuration is default so block size is 64MB.
> If I understand it correctly Hadoop should be running 64 Mappers to process
> the data.
> I'm running a simple data counting MapReduce and it's taking about 30mins
> to
> complete. This seems like way too much, doesn't it?
> Is there any tunning you guys would recommend to try and see an improvement
> in performance?
> Thanks,
> Pony
> This e-mail message may contain privileged and/or confidential information,
> and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other use
> of this e-mail by you is strictly prohibited.
> All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
> The information contained in this email may be subject to the export
> control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR)
> and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message