hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@archive.org>
Subject Re: Hung job
Date Wed, 15 Mar 2006 00:57:20 GMT
Whoops.

Need a means of resetting the LogFormatter loggedSevere flag when doing 
a STALE_STATE soft restart of TaskTracker otherwise, TaskTracker comes 
back up, checks loggedSevere, its set (still), so we restart again... ad 
infinitum.  Suggested patch attached.

Thanks,
St.Ack






stack wrote:
> Doug Cutting wrote:
>> stack wrote:
>>> ...
>>>
>>> Somehow the reduce needs to give up and the jobtracker needs to rerun 
>>> the map just as it would if the tasktracker had died completely.
>>
>> Perhaps what should happen is that the TaskTracker should exit when it 
>> encounters errors reading map output.....
>>
>> I've attached a patch.  The TaskTracker will restart, but with a new 
>> id, so all of its tasks will be considered lost.  This will 
>> unfortunately lose other map tasks done by this tasktracker, but at 
>> least things will keep going.
>>
>> Does this look right to you?
>>
> 
> Yes. Sounds like right thing to do. Minor comments in the below. 
> Meantime, let me try it.
> Thanks,
> St.Ack
> 
> 
>> Doug
>>
>>
> ...
> 
>>  
>>          return 0;
>> Index: src/java/org/apache/hadoop/mapred/MapOutputFile.java
>> ===================================================================
>> --- src/java/org/apache/hadoop/mapred/MapOutputFile.java    (revision 
>> 385629)
>> +++ src/java/org/apache/hadoop/mapred/MapOutputFile.java    (working 
>> copy)
>> @@ -17,6 +17,7 @@
>>  package org.apache.hadoop.mapred;
>>  
>>  import java.io.IOException;
>> +import java.util.logging.Level;
>>  
>>  import java.io.*;
>>  import org.apache.hadoop.io.*;
>> @@ -108,12 +109,26 @@
>>      // write the length-prefixed file content to the wire
>>      File file = getOutputFile(mapTaskId, partition);
>>      out.writeLong(file.length());
>> -    FSDataInputStream in = FileSystem.getNamed("local", 
>> this.jobConf).open(file);
>> +
>> +    FSDataInputStream in = null;
>>      try {
>> +      in = FileSystem.getNamed("local", this.jobConf).open(file);
>> +    } catch (IOException e) {
>> +      // log a SEVERE exception in order to cause TaskTracker to exit
>> +      TaskTracker.LOG.log(Level.SEVERE, "Can't open map output:" + 
>> file, e);
>> +
> 
> Should there be a 'throw e;' after TaskTracker.LOG.log above?
> 
> 
> 
>     }
>> +    try {
>>        byte[] buffer = new byte[8192];
>> -      int l;
>> -      while ((l = in.read(buffer)) != -1) {
>> +      int l  = 0;
>> +      +      while (l != -1) {
>>          out.write(buffer, 0, l);
>> +        try {
>> +          l = in.read(buffer);
>> +        } catch (IOException e) {
>> +          // log a SEVERE exception in order to cause TaskTracker to 
>> exit
>> +          TaskTracker.LOG.log(Level.SEVERE,"Can't read map output:" + 
>> file, e);
> 
> 
> And same here.
> 
> 
>> +        }
>>        }
>>      } finally {
>>        in.close();
> 


Mime
View raw message