hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1874) lost task trackers -- jobs hang
Date Wed, 10 Oct 2007 12:24:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533717

Hudson commented on HADOOP-1874:

Integrated in Hadoop-Nightly #267 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/267/])

> lost task trackers -- jobs hang
> -------------------------------
>                 Key: HADOOP-1874
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1874
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.15.0
>         Attachments: 1874.new.patch, 1874.new.patch, 1874.patch, lazy-dfs-ops.1.patch,
lazy-dfs-ops.2.patch, lazy-dfs-ops.4.patch, lazy-dfs-ops.patch, server-throttle-hack.patch
> This happens on a 1400 node cluster using a recent nightly build patched with HADOOP-1763
(that fixes a previous 'lost task tracker' issue) running a c++-pipes job with 4200 maps and
2800 reduces. The task trackers start to get lost in high numbers at the end of job completion.
> Similar non-pipes job do not show the same problem, but is unclear whether it is related
to c++-pipes. It could also be dfs overload when reduce tasks close and validate all newly
created dfs files. I see dfs client rpc timeout exception. But this alone does not explain
the escalation in losing task trackers.
> I also noticed that the job tracker becomes rather unresponsive with rpc timeout and
call queue overflow exceptions. Job Tracker is running with 60 handlers.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message