hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1475) local filecache disappears
Date Mon, 18 Jun 2007 05:58:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Owen O'Malley updated HADOOP-1475:
----------------------------------

    Attachment: dist-cache-purge.patch

This patch clears the cache before the task tracker reinitializes itself. This prevents the
file cache from thinking it has files that it in fact doesn't have, because the backing files
have been deleted in the reinitialization.

> local filecache disappears
> --------------------------
>
>                 Key: HADOOP-1475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1475
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Christian Kunz
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: dist-cache-purge.patch
>
>
> All our jobs on a 600 node cluster fail. Symptom is that the local filecache disappears.
> It might have to do with the fact that lost task trackers get re-initialized when they
send a heartbeat again, and purge the local directory completely without updating the filecache.
> Side issue is;
> why do we get so many lost tasktrackers which then resume the heartbeat (a kind of 'bogus'
lost tasktracker)?. We lost tasktrackers:
> 13 in the 1st hour of the job
> 18 in the 2nd hour
> 33 in the 3rd hour
> Then the job failed.
> E.g. all the tasktrackers lost in the first 2 hours of the job got logged sometime later
with a 'Status from unknown Tracker' in the jobtracker log and got reinitialized.
> I attach some jobracker log messages showing how the heartbeat of the lost tasktrackers
come in late, sometimes less than 1 minute late, sometimes up to 16 minutes. What could be
the reason? Do the heartbeats get lost? 
> 2007-06-07 13:09:08,518 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_070
> 2007-06-07 13:09:48,919 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_070
> 2007-06-07 13:39:08,740 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_075
> 2007-06-07 13:41:50,810 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_075
> 2007-06-07 14:32:29,093 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_082
> 2007-06-07 14:35:34,217 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_082
> 2007-06-07 14:15:48,856 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_085
> 2007-06-07 14:20:21,337 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_085
> 2007-06-07 15:25:49,524 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_098
> 2007-06-07 15:33:56,732 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_098
> 2007-06-07 14:49:09,203 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_106
> 2007-06-07 14:54:25,538 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_106
> 2007-06-07 15:02:29,337 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_108
> 2007-06-07 15:02:57,558 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_108
> 2007-06-07 14:19:09,022 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_112
> 2007-06-07 14:19:15,273 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_112
> 2007-06-07 14:19:08,881 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_114
> 2007-06-07 14:30:03,354 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_114
> 2007-06-07 15:42:29,579 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_116
> 2007-06-07 15:43:06,422 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_116
> 2007-06-07 14:55:49,280 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_117
> 2007-06-07 14:56:38,452 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_117
> 2007-06-07 15:15:49,461 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_120
> 2007-06-07 15:31:37,028 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_120
> 2007-06-07 15:09:09,435 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_174
> 2007-06-07 15:18:31,254 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_174

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message