hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6026) Improve the performance efficiency of task initialization at the JobTracker
Date Fri, 12 Jun 2009 08:04:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718753#action_12718753

dhruba borthakur commented on HADOOP-6026:

One drawback to the above situation is that the mapping of a hostname to its racklocation
would be permanent for the lifetime of a JobTracker. To accomodate a more rapidly changing
network topology, we can expire items from the cache after every hour or so.

> Improve the performance efficiency of task initialization at the JobTracker
> ---------------------------------------------------------------------------
>                 Key: HADOOP-6026
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6026
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: dhruba borthakur
>            Assignee: Zheng Shao
> The JobTracker reads the splits for a job at Job Initialization time. Then, for each
location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes
an external script that resolves the hostname to a network rack location. The time spent in
invoking this external script can be reduced if the hostname and their rack locations are
inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and
avoid invoking the external "resolve" script is most cases. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message