hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1018) Single lost heartbeat leads to a "Lost task tracker"
Date Thu, 06 Sep 2007 06:42:32 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated HADOOP-1018:
----------------------------------

    Attachment: HADOOP-1018_1_20070906.patch

Attached fix to comments/log-messages...

> Single lost heartbeat leads to a "Lost task tracker"
> ----------------------------------------------------
>
>                 Key: HADOOP-1018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1018
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.10.0, 0.11.2, 0.12.0
>         Environment: Nutch trunk/ (Hadoop 0.10.0), Linux, JDK 1.5, a cluster of 9 machines.
>            Reporter: Andrzej Bialecki 
>            Assignee: Arun C Murthy
>             Fix For: 0.15.0
>
>         Attachments: HADOOP-1018_1_20070906.patch
>
>
> Under heavy load, task tracker may lose the heartbeat response from the JobTracker. Task
tracker tries to resend the last heartbeat message, which job tracker treats as "duplicate"
response and ignores. Since task tracker tries to resend the same heartbeat message, with
the same id, over and over again, no "valid" messages reach the job tracker, so after a while
it considers the task tracker to be lost. Task tracker cannot recover from this state and
needs to be restarted.
> Looking at Hadoop trunk/ I believe this problem still may occur - in JobTracker.java.heartbeat():992
JobTracker should not ignore duplicate messages but acknowledge them without processing. This
would cause the task tracker to sync back it's last heartbeat id with the last hearbeat id
remembered in the job tracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message