hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6026) Improve the performance efficiency of task initialization at the JobTracker
Date Fri, 12 Jun 2009 08:04:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718753#action_12718753
] 

dhruba borthakur commented on HADOOP-6026:
------------------------------------------

One drawback to the above situation is that the mapping of a hostname to its racklocation
would be permanent for the lifetime of a JobTracker. To accomodate a more rapidly changing
network topology, we can expire items from the cache after every hour or so.

> Improve the performance efficiency of task initialization at the JobTracker
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-6026
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6026
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: dhruba borthakur
>            Assignee: Zheng Shao
>
> The JobTracker reads the splits for a job at Job Initialization time. Then, for each
location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes
an external script that resolves the hostname to a network rack location. The time spent in
invoking this external script can be reduced if the hostname and their rack locations are
inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and
avoid invoking the external "resolve" script is most cases. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message