hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-133) the TaskTracker.Child.ping thread calls exit
Date Fri, 14 Apr 2006 04:46:00 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-133?page=comments#action_12374459 ] 

Owen O'Malley commented on HADOOP-133:

The results are interesting. Based on a 45 minute run (on 195 linux nodes) of my random writer,
I got:

226 instaces of exit == 143
11 instances of exit == -113
  2 instances of exit == 65   // exception thrown on the ping
  0 instances of exit == 66  // task tracker not recognizing the task

So there is a lot of dying going on that I don't understand. Tomorrow, I'll make a patch for
my code that  changes ping from "void ping(string)" to "boolean ping(string)" where false
means that the task is unknown and exceptions get a second chance. Do you happen to recognize
either 143 or -113? Doing a quick search in eclipse, I didn't see them.

> the TaskTracker.Child.ping thread calls exit
> --------------------------------------------
>          Key: HADOOP-133
>          URL: http://issues.apache.org/jira/browse/HADOOP-133
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley

> The TaskTracker.Child.startPinging thread calls exit if the TaskTracker doesn't respond.
Calling exit in a mutli-threaded program is really problematic. In particular, it prevents
cleanup/finally clauses from running. We need to move to a model where it uses Thread.interrupt(),
which means we need to check the interrupt flag in place in the map loop and reduce loop and
stop masking the InterruptExceptions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message