hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zheng Shao (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-6026) Improve the performance efficiency of task initialization at the JobTracker
Date Mon, 15 Jun 2009 01:46:07 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-6026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Zheng Shao updated HADOOP-6026:

    Attachment: HADOOP-6026.1.patch

I agree with Dhruba's comment but I think currently there is probably no such requirement
from any real deployed environment. And if there is, simple uniform timeout may not be the
best way to deprecate an item in the cache.

I will vote for simplicity of the code for now. I've put a comment there. In the future people
can add caching policy if such a requirement comes up.

> Improve the performance efficiency of task initialization at the JobTracker
> ---------------------------------------------------------------------------
>                 Key: HADOOP-6026
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6026
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: dhruba borthakur
>            Assignee: Zheng Shao
>         Attachments: HADOOP-6026.1.patch
> The JobTracker reads the splits for a job at Job Initialization time. Then, for each
location in the split, it invokes DNSToSwitchMapping.resolve(). This, in turn, typically invokes
an external script that resolves the hostname to a network rack location. The time spent in
invoking this external script can be reduced if the hostname and their rack locations are
inserted into a cache. JobTracker.resolveAndAddToTopology() can look up this cache first and
avoid invoking the external "resolve" script is most cases. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message