hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-133) the TaskTracker.Child.ping thread calls exit
Date Thu, 13 Apr 2006 21:41:01 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-133?page=comments#action_12374421 ] 

Doug Cutting commented on HADOOP-133:

We can't always rely on cleanup/finally stuff to run.  JVMs can exit unexpectedly.  We hope
it doesn't happen often, but we must be able to handle that situation.  If we need to, e.g.,
clean up temp files, we do that on startup.

The reason this was added was to handle the case where the tasktracker has exited and the
child is somehow hung.  We must not leave stray, hung, JVMs around.  Thread.interrupt() is
not reliable enough.  When a thread is hung, it will not recieve an interrupt.  I've seen
this frequently when fetching, where socket read()  requests hang indefinitely, despite the
socket having a short read timeout.

So I'd be happy to have this first try to exit more gracefully, but, after a time, it should
still call exit().  The child processes do not have a pid file.  Once their parent has died,
nothing tracks them, so they must reliably exit fairly quickly when their parent dies.

> the TaskTracker.Child.ping thread calls exit
> --------------------------------------------
>          Key: HADOOP-133
>          URL: http://issues.apache.org/jira/browse/HADOOP-133
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley

> The TaskTracker.Child.startPinging thread calls exit if the TaskTracker doesn't respond.
Calling exit in a mutli-threaded program is really problematic. In particular, it prevents
cleanup/finally clauses from running. We need to move to a model where it uses Thread.interrupt(),
which means we need to check the interrupt flag in place in the map loop and reduce loop and
stop masking the InterruptExceptions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message