hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1475) local filecache disappears
Date Wed, 20 Jun 2007 19:25:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Doug Cutting updated HADOOP-1475:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Owen!

http://svn.apache.org/viewvc?view=rev&rev=549200

> local filecache disappears
> --------------------------
>
>                 Key: HADOOP-1475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1475
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Christian Kunz
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: dist-cache-purge.patch
>
>
> All our jobs on a 600 node cluster fail. Symptom is that the local filecache disappears.
> It might have to do with the fact that lost task trackers get re-initialized when they
send a heartbeat again, and purge the local directory completely without updating the filecache.
> Side issue is;
> why do we get so many lost tasktrackers which then resume the heartbeat (a kind of 'bogus'
lost tasktracker)?. We lost tasktrackers:
> 13 in the 1st hour of the job
> 18 in the 2nd hour
> 33 in the 3rd hour
> Then the job failed.
> E.g. all the tasktrackers lost in the first 2 hours of the job got logged sometime later
with a 'Status from unknown Tracker' in the jobtracker log and got reinitialized.
> I attach some jobracker log messages showing how the heartbeat of the lost tasktrackers
come in late, sometimes less than 1 minute late, sometimes up to 16 minutes. What could be
the reason? Do the heartbeats get lost? 
> 2007-06-07 13:09:08,518 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_070
> 2007-06-07 13:09:48,919 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_070
> 2007-06-07 13:39:08,740 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_075
> 2007-06-07 13:41:50,810 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_075
> 2007-06-07 14:32:29,093 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_082
> 2007-06-07 14:35:34,217 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_082
> 2007-06-07 14:15:48,856 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_085
> 2007-06-07 14:20:21,337 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_085
> 2007-06-07 15:25:49,524 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_098
> 2007-06-07 15:33:56,732 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_098
> 2007-06-07 14:49:09,203 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_106
> 2007-06-07 14:54:25,538 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_106
> 2007-06-07 15:02:29,337 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_108
> 2007-06-07 15:02:57,558 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_108
> 2007-06-07 14:19:09,022 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_112
> 2007-06-07 14:19:15,273 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_112
> 2007-06-07 14:19:08,881 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_114
> 2007-06-07 14:30:03,354 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_114
> 2007-06-07 15:42:29,579 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_116
> 2007-06-07 15:43:06,422 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_116
> 2007-06-07 14:55:49,280 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_117
> 2007-06-07 14:56:38,452 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_117
> 2007-06-07 15:15:49,461 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_120
> 2007-06-07 15:31:37,028 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_120
> 2007-06-07 15:09:09,435 INFO org.apache.hadoop.mapred.JobTracker: Lost tracker tracker_174
> 2007-06-07 15:18:31,254 WARN org.apache.hadoop.mapred.JobTracker: Status_from_unknown_Tracker
: tracker_174

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message