hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nathan Roberts (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-4499) Looking for speculative tasks is very expensive in 1.x
Date Tue, 31 Jul 2012 21:51:34 GMT
Nathan Roberts created MAPREDUCE-4499:
-----------------------------------------

             Summary: Looking for speculative tasks is very expensive in 1.x
                 Key: MAPREDUCE-4499
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4499
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv1, performance
    Affects Versions: 1.0.3
            Reporter: Nathan Roberts


When there are lots of jobs and tasks active in a cluster, the process of figuring out whether
or not to launch a speculative task becomes very expensive. 

I could be missing something but it certainly looks like on every heartbeat we could be scanning
10's of thousands of tasks looking for something which might need to be speculatively executed.
In most cases, nothing gets chosen so we completely trashed our data cache and didn't even
find a task to schedule, just to do it all over again on the next heartbeat.

On busy jobtrackers, the following backtrace is very common:

"IPC Server handler 32 on 50300" daemon prio=10 tid=0x00002ab36c74f800
nid=0xb50 runnable [0x0000000045adb000]
   java.lang.Thread.State: RUNNABLE
        at java.util.TreeMap.valEquals(TreeMap.java:1182)
        at java.util.TreeMap.containsValue(TreeMap.java:227)
        at java.util.TreeMap$Values.contains(TreeMap.java:940)
        at
org.apache.hadoop.mapred.TaskInProgress.hasRunOnMachine(TaskInProgress.java:1072)
        at
org.apache.hadoop.mapred.JobInProgress.findSpeculativeTask(JobInProgress.java:2193)
        - locked <0x00002aaefde82338> (a
org.apache.hadoop.mapred.JobInProgress)
        at
org.apache.hadoop.mapred.JobInProgress.findNewMapTask(JobInProgress.java:2417)
        - locked <0x00002aaefde82338> (a
org.apache.hadoop.mapred.JobInProgress)
        at
org.apache.hadoop.mapred.JobInProgress.obtainNewNonLocalMapTask(JobInProgress.java:1432)
        - locked <0x00002aaefde82338> (a
org.apache.hadoop.mapred.JobInProgress)
        at
org.apache.hadoop.mapred.CapacityTaskScheduler$MapSchedulingMgr.obtainNewTask(CapacityTaskScheduler.java:525)
        at
org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.getTaskFromQueue(CapacityTaskScheduler.java:322)
        at
org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.assignTasks(CapacityTaskScheduler.java:419)
        at
org.apache.hadoop.mapred.CapacityTaskScheduler$TaskSchedulingMgr.access$500(CapacityTaskScheduler.java:150)
        at
org.apache.hadoop.mapred.CapacityTaskScheduler.addMapTasks(CapacityTaskScheduler.java:1075)
        at
org.apache.hadoop.mapred.CapacityTaskScheduler.assignTasks(CapacityTaskScheduler.java:1044)
        - locked <0x00002aab6e27a4c8> (a
org.apache.hadoop.mapred.CapacityTaskScheduler)
        at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3398)
        - locked <0x00002aab6e191278> (a org.apache.hadoop.mapred.JobTracker)
...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message