hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kang Xiao (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2340) optimize JobInProgress.initTasks()
Date Mon, 21 Feb 2011 05:04:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997278#comment-12997278
] 

Kang Xiao commented on MAPREDUCE-2340:
--------------------------------------

For large jobs, job initialization seem to be very slow. The cause is that JobInProgress.initTasks()
calls createCache() to build localiztion cache list. For each split location createCache()
uses jobtracker.resolveAndAddToTopology(host) to get its topology node object. However, there
is alreay a hostname => topology node map cache in jobtracker that can be used to speed
up the get node by hostname operation. 

> optimize JobInProgress.initTasks()
> ----------------------------------
>
>                 Key: MAPREDUCE-2340
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2340
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Kang Xiao
>
> JobTracker's hostnameToNodeMap cache can speed up JobInProgress.initTasks() and JobInProgress.createCache()
significantly. A test for 1 job with 100000 maps on a 2400 cluster shows nearly 10 and 50
times speed up for initTasks() and createCache(). 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message