hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From C G <parallel...@yahoo.com>
Subject RE: Questions regarding configuration parameters...
Date Fri, 22 Feb 2008 05:51:07 GMT
My performance problems fall into 2 categories:
  1.  Extremely slow reduce phases - our map phases march along at impressive speed, but during
reduce phases most nodes go idle...the active machines mostly clunk along at 10-30% CPU. 
Compare this to the map phase where I get all grid nodes cranking away at > 100% CPU. 
This is a vague explanation I realize.
  2.  Pregnant pauses during dfs -copyToLocal and -cat operations.  Frequently I'll be iterating
over a list of HDFS files cat-ing them into one file to bulk load into a database.  Many times
I'll see one of the copies/cats sit for anywhere from 2-5 minutes.  During that time no data
is transferred, all nodes are idle, and absolutely nothing is written to any of the logs.
 The file sizes being copied are relatively small...less than 1G each in most cases.
  Both of these issues persist in 0.16.0 and definitely have me puzzled.  I'm sure that I'm
doing something wrong/non-optimal w/r/t slow reduce phases, but the long pauses during a dfs
command line operation seems like a bug to me.  Unfortunately I've not seen anybody else report
  Any thoughts/ideas most welcome...
  C G

Joydeep Sen Sarma <jssarma@facebook.com> wrote:
> The default value are 2 so you might only see 2 cores used by Hadoop per
> node/host.

that's 2 each for map and reduce. so theoretically - one could fully utilize a 4 core box
with this setting. in practice - a little bit of oversubscription (3 each on a 4 core) seems
to be working out well for us (maybe overlapping some compute and io - but mostly we are trading
off for higher # concurrent jobs against per job latency).

unlikely that these settings are causing slowness in processing small amounts of data. send
more details - what's slow (map/shuffle/reduce)? check cpu consumption when map task is running
.. etc.

-----Original Message-----
From: Andy Li [mailto:annndy.lee@gmail.com]
Sent: Thu 2/21/2008 2:36 PM
To: core-user@hadoop.apache.org
Subject: Re: Questions regarding configuration parameters...

Try the 2 parameters to utilize all the cores per node/host.

The maximum number of map tasks that will be run
simultaneously by a task tracker.

The maximum number of reduce tasks that will be run
simultaneously by a task tracker.

The default value are 2 so you might only see 2 cores used by Hadoop per
If each system/machine has 4 cores (dual dual core), then you can change
them to 3.

Hope this works for you.


On Wed, Feb 20, 2008 at 9:30 AM, C G 

> Hi All:
> The documentation for the configuration parameters mapred.map.tasks and
> mapred.reduce.tasks discuss these values in terms of "number of available
> hosts" in the grid. This description strikes me as a bit odd given that a
> "host" could be anything from a uniprocessor to an N-way box, where values
> for N could vary from 2..16 or more. The documentation is also vague about
> computing the actual value. For example, for mapred.map.tasks the doc
> says ".a prime number several times greater.". I'm curious about how people
> are interpreting the descriptions and what values people are using.
> Specifically, I'm wondering if I should be using "core count" instead of
> "host count" to set these values.
> In the specific case of my system, we have 24 hosts where each host is a
> 4-way system (i.e. 96 cores total). For mapred.map.tasks I chose the
> value 173, as that is a prime number which is near 7*24. For
> mapred.reduce.tasks I chose 23 since that is a prime number close to 24.
> Is this what was intended?
> Beyond curiousity, I'm concerned about setting these values and other
> configuration parameters correctly because I am pursuing some performance
> issues where it is taking a very long time to process small amounts of data.
> I am hoping that some amount of tuning will resolve the problems.
> Any thoughts and insights most appreciated.
> Thanks,
> C G
> ---------------------------------
> Never miss a thing. Make Yahoo your homepage.

Looking for last minute shopping deals?  Find them fast with Yahoo! Search.
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message