hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6224) resolve the hosts in DNSToSwitchMapping before inter tracker server start to avoid IPC timeout in Task Tracker heartbeat
Date Sun, 25 Jan 2015 06:35:34 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhihai xu updated MAPREDUCE-6224:
---------------------------------
    Status: Patch Available  (was: Open)

> resolve the hosts in DNSToSwitchMapping before inter tracker server start to avoid IPC
timeout in Task Tracker heartbeat
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6224
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6224
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>         Attachments: MAPREDUCE-6224.branch-1.000.patch
>
>
> Resolve the hosts to fill up the cache in CachedDNSToSwitchMapping before inter tracker
server start to avoid IPC timeout in Task Tracker heartbeat.
> We saw IPC timeout happen in Task Tracker heartbeat for a large MR1 cluster which use
topology script(ShellCommandExecutor) to resolve the Network Topology for Task Tracker host
in ScriptBasedMapping.
> The reason is 
> Right after inter tracker server start in Job Tracker, Job Tracker receive a lots HeartBeat
from the Task Tracker. 
> heartbeat function call resolveAndAddToTopology to resolve the Network Topology for Task
Tracker host in ScriptBasedMapping which implement CachedDNSToSwitchMapping.
> ScriptBasedMapping#resolve will check whether the host is in the cache,
> If the host is not in the cache, it will run topology script to get the host's Network
Topology using ShellCommandExecutor. Normally running topology script is time consuming, which
may cause the IPC time if too many heartbeat happened at the same time for a large MR1 cluster.
> The solution is to resolve the Network Topology for all hosts in the hosts list from
HostsFileReader before receive any heartbeat from Task Tracker, so the cache in ScriptBasedMapping
will be filled up, and when heartbeat call resolveAndAddToTopology, it will get the result
from the cache instead of running topology script.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message