hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elia Mazzawi <elia.mazz...@casalemedia.com>
Subject Re: dual core configuration
Date Wed, 08 Oct 2008 17:03:16 GMT
false alarm guys, thanks for the replies,
I do have 2 set as the task maximum, and it is utilizing 2 cores 
according to top.
I must have caught it in between tasks or during the reduce, since i had 
only 1 reducer per node going on at the time.

hadoop-default.xml:
<property>
  <name>mapred.tasktracker.map.tasks.maximum</name>
  <value>2</value>
</property>

output from top:

top - 12:54:50 up 48 days, 16:19,  1 user,  load average: 2.60, 1.55, 0.66
Tasks:  80 total,   3 running,  77 sleeping,   0 stopped,   0 zombie
Cpu0  : 98.1%us,  1.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  
0.0%st
Cpu1  : 95.8%us,  2.9%sy,  0.0%ni,  0.0%id,  1.3%wa,  0.0%hi,  0.0%si,  
0.0%st
Mem:   1035160k total,  1019608k used,    15552k free,     1808k buffers
Swap:  2031608k total,      372k used,  2031236k free,   293612k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  
COMMAND                                              
 2469 root      25   0  410m 161m  10m R 44.5 15.9   0:40.40 
java                                                 
 2446 root      25   0  411m 161m  11m R 43.2 16.0   0:45.88 
java               


Alex Loddengaard wrote:
> Elia, perhaps you can try changing "mapred.tasktracker.map.tasks.maximum"
> and "mapred.tasktracker.reduce.tasks.maximum" to "4" in hadoop-site.xml in
> hopes of getting better utilization.  It's strange to me that having these
> both set to 2 only utilizes a single core, because I would imagine that any
> modern OS scheduler would do a good job of core utilization.
>
> Just a thought.
>
> Alex
>
> On Wed, Oct 8, 2008 at 12:52 AM, Taeho Kang <tkang1@gmail.com> wrote:
>
>   
>> First of all, "mapred.tasktracker.map.tasks.maximum" and
>> "mapred.tasktracker.reduce.tasks.maximum" are both set to 2 in
>> hadoop-default.xml file; this file is read before hadoop-site.xml file so
>> any properties that aren't set in hadoop-site.xml will follow the values
>> set
>> in hadoop-default.xml.
>> As for the question on why only one core is utilized...
>> I think it really depends on the process scheduling of the underlying OS.
>> It's not like two tasks (two JVM subprocesses spawned by the tasktracker)
>> will always run on independent cores as there are other processes which
>> need
>> one or more cores to be run.
>>
>> By the way, what tools did you use to find out which tasks (or processes)
>> use which cores?
>>
>> /Taeho
>>
>>
>> On Wed, Oct 8, 2008 at 1:01 PM, Alex Loddengaard
>> <alexloddengaard@gmail.com>wrote:
>>
>>     
>>> Taeho, I was going to suggest this change as well, but it's documented
>>>       
>> that
>>     
>>> "mapred.tasktracker.map.tasks.maximum" defaults to 2.  Can you explain
>>>       
>> why
>>     
>>> Elia is only having one core utilized when this config option is set to
>>>       
>> 2?
>>     
>>> Here is the documentation I'm referring to:
>>> <http://hadoop.apache.org/core/docs/r0.18.1/cluster_setup.html>
>>>
>>> Alex
>>>
>>> On Tue, Oct 7, 2008 at 8:27 PM, Taeho Kang <tkang1@gmail.com> wrote:
>>>
>>>       
>>>> You can have your node (tasktracker) running more than 1 task
>>>> simultaneously.
>>>> You may set "mapred.tasktracker.map.tasks.maximum" and
>>>> "mapred.tasktracker.reduce.tasks.maximum" properties found in
>>>> hadoop-site.xml file. You should change hadoop-site.xml file on all
>>>>         
>> your
>>     
>>>> slave nodes depending on how many cores each slave has. For example,
>>>>         
>> you
>>     
>>>> don't really want to have 8 tasks running at once on a 2 core machine.
>>>>
>>>> /Taeho
>>>>
>>>> On Wed, Oct 8, 2008 at 5:53 AM, Elia Mazzawi
>>>> <elia.mazzawi@casalemedia.com>wrote:
>>>>
>>>>         
>>>>> hello,
>>>>>
>>>>> I have some dual core nodes, and I've noticed hadoop is only running
>>>>>           
>> 1
>>     
>>>>> instance, and so is only using 1 on the CPU's on each node.
>>>>> is there a configuration to tell it to run more than once?
>>>>> or do i need to turn each machine into 2 nodes?
>>>>>
>>>>> Thanks.
>>>>>
>>>>>           
>
>   


Mime
View raw message