hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject RE: utilizing all cores on single-node hadoop
Date Tue, 18 Aug 2009 05:35:35 GMT
While setting mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum,
please consider the memory usage your application might have since all tasks will be competing
for the same and might reduce overall performance.

Thanks,
Amogh
-----Original Message-----
From: Harish Mallipeddi [mailto:harish.mallipeddi@gmail.com] 
Sent: Tuesday, August 18, 2009 10:37 AM
To: common-user@hadoop.apache.org
Subject: Re: utilizing all cores on single-node hadoop

Hi Vasilis,

Here's some info that I know:

mapred.map.tasks - this is a job-specific setting. This is just a hint to
InputFormat as to how many InputSplits (and hence MapTasks) you want for
your job. The default InputFormat classes usually keep each split size to
the HDFS block size (64MB default). So if your input data is less than 64
MB, it will just result in only 1 split and hence 1 MapTask only.

mapred.reduce.tasks - this is also a job-specific setting.

mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

The above 2 are tasktracker-specific config options and determine how many
"simultaneous" MapTasks and ReduceTasks run on each TT. Ideally on a 8-core
box, you would want to set map.tasks.maximum to something like 6 and
reduce.tasks.maximum to 4 to utilize all the 8 cores to the maximum (there's
a little bit of over-subscription to account for tasks idling while doing
I/O).

In the web admin console, how many map-tasks and reduce-tasks are reported
to have been launched for your job?

Cheers,
Harish

On Tue, Aug 18, 2009 at 5:47 AM, Vasilis Liaskovitis <vliaskov@gmail.com>wrote:

> Hi,
>
> I am a beginner trying to setup a few simple hadoop tests on a single
> node before moving on to a cluster. I am just using the simple
> wordcount example for now. My question is what's the best way to
> guarantee utilization of all cores on a single-node? So assuming a
> single node with 16-cores what are the suggested values for:
>
> mapred.map.tasks
> mapred.reduce.tasks
>
mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.map.tasks.maxium
>

> I found an old similar thread
> http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00152.html
> and I have followed similar settings for my 16-core system (e.g.
> map.tasks=reduce.tasks=90 and map.tasks.maximum=100), however I always
> see only 3-4 cores utilized using top.
>
> - The description for mapred.map.tasks says "Ignored when
> mapred.job.tracker is "local" ", and in my case
> mapred.job.tracker=hdfs://localhost:54311
> is it possible that the map.tasks and reduce.tasks I am setting are
> being ignored? How can I verify this? Is there a way to enforce my
> values even on a localhost scenario like this?
>
> - Are there other config options/values that I need to set besides the
> 4 I mentioned above?
>
> - Also is it possible that for short tasks, I won't see full
> utilization of all cores anyway? Something along those lines is
> mentioned in an issue a year ago:
> http://issues.apache.org/jira/browse/HADOOP-3136
> "If the individual tasks are very short i.e. run for less than the
> heartbeat interval the TaskTracker serially runs one task at a time"
>
> I am using hadoop-0.19.2
>
> thanks for any guidance,
>
> - Vasilis
>



-- 
Harish Mallipeddi
http://blog.poundbang.in

Mime
View raw message