hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WangRamon <ramon_w...@hotmail.com>
Subject Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.
Date Sat, 10 Mar 2012 10:39:28 GMT




Hi All I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I have 14
map and 14 reduce slots, here is the configuration:      <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>14</value>
    </property>    <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>14</value>
    </property>    <property>
        <name>mapred.reduce.tasks</name>
        <value>73</value>
    </property>
 When I submit 5 Jobs simultaneously (the input data for each job is not so big for the test,
it's about 2~5M in size), I assume the Jobs will use the slots as much as possible, each Job
did created 73 Reduce Tasks as configured above, so there will be 5 * 73 Reduce Tasks in total,
but, most of them are in pending state, only about 12 of them are running, it's too small
compared to the total slots number for reduce, 42 reduce slots for the 3 nodes cluster.  What
interestring is that it always about 12 of them are running, I tried a few times. So, I thought
it might because about the scheduler, I changed it to Fair Scheduler, I created 3 pools, the
configure is as below: <?xml version="1.0"?>
<allocations>
 <pool name="pool-a">
  <minMaps>14</minMaps>
  <minReduces>14</minReduces>
  <weight>1.0</weight>
 </pool>
 <pool name="pool-b">
  <minMaps>14</minMaps>
  <minReduces>14</minReduces>
  <weight>1.0</weight>
 </pool>
 <pool name="pool-c">
  <minMaps>14</minMaps>
  <minReduces>14</minReduces>
  <weight>1.0</weight>
 </pool>
 
</allocations>  Then I submit the 5 Jobs simultaneously to these pools randomly again,
I can see the jobs were assigned to different pools, but, it's still the same problem only
about 12 of the reduce tasks from different pool are running, here is the output i copied
from the Fair Scheduler monitor GUI: pool-a 2 14 14 0 9
pool-b 0 14 14 0 0 
pool-c 2 14 14 0 3  pool-a and pool-c have a total of 12 reduce tasks running, but I do have
about 11 reduce slots at least available in my cluster. So can anyone please give me some
suggestions, why NOT all my REDUCE SLOTS are working? Thanks in advance.  Cheers Ramon 		
	   		  
Mime
View raw message