hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WangRamon <ramon_w...@hotmail.com>
Subject RE: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.
Date Sat, 10 Mar 2012 12:05:04 GMT

Joey, here is the information: Cluster Summary (Heap Size is 481.88 MB/1.74 GB)Maps Reduces
Total Submissions Nodes Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted
Nodes 
0        6             11                          3         42                          
42                                 28.00                   0   CheersRamon

 Subject: Re: Why most of the free reduce slots are NOT used for my Hadoop Jobs? Thanks.
From: joey@cloudera.com
Date: Sat, 10 Mar 2012 07:00:26 -0500
To: mapreduce-user@hadoop.apache.org



What does the jobtracker web page say is the total reduce capacity?
-Joey



On Mar 10, 2012, at 5:39, WangRamon <ramon_wang@hotmail.com> wrote:








Hi All
 
I'm using Hadoop-0.20-append, the cluster contains 3 nodes, for each node I have 14 map and
14 reduce slots, here is the configuration:
 
 
    <property>
        <name>mapred.tasktracker.map.tasks.maximum</name>
        <value>14</value>
    </property>
    <property>
        <name>mapred.tasktracker.reduce.tasks.maximum</name>
        <value>14</value>
    </property>
    <property>
        <name>mapred.reduce.tasks</name>
        <value>73</value>
    </property>

 
When I submit 5 Jobs simultane
 ously (the input data for each job is not so big for the test, it's about 2~5M in size),
I assume the Jobs will use the slots as much as possible, each Job did created 73 Reduce Tasks
as configured above, so there will be 5 * 73 Reduce Tasks in total, but, most of them are
in pending state, only about 12 of them are running, it's too small compared to the total
slots number for reduce, 42 reduce slots for the 3 nodes cluster. 
 
What interestring is that it always about 12 of them are running, I tried a few times.
 
So, I thought it might because about the scheduler, I changed it to Fair Scheduler, I created
3 pools, the configure is as below:
 
<?xml version="1.0"?>
<allocations>
 <pool name="pool-a">
  <minMaps>14</minMaps>
  <minReduces>14</minReduces>
  <weight>1.0</weight>
 </pool>
 <pool name="pool
 -b">
  <minMaps>14</minMaps>
  <minReduces>14</minReduces>
  <weight>1.0</weight>
 </pool>
 <pool name="pool-c">
  <minMaps>14</minMaps>
  <minReduces>14</minReduces>
  <weight>1.0</weight>
 </pool>
 
</allocations> 
 
Then I submit the 5 Jobs simultaneously to these pools randomly again, I can see the jobs
were assigned to different pools, but, it's still the same problem only about 12 of the reduce
tasks from different pool are running, here is the output i copied from the Fair Scheduler
monitor GUI:
 
pool-a 2 14 14 0 9
pool-b 0 14 14 0 0 
pool-c 2 14 14 0 3 
 
pool-a and pool-c have a total of 12 reduce tasks running, but I do have about 11 reduce slots
at least available in my cluster.
 
So can anyone
  please give me some suggestions, why NOT all my REDUCE SLOTS are working? Thanks in advance.

 
Cheers 
Ramon
 		 	   		  
 		 	   		  
Mime
View raw message