hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-430) Task stuck in cleanup with OutOfMemoryErrors
Date Tue, 25 Aug 2009 06:13:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747242#action_12747242
] 

Arun C Murthy commented on MAPREDUCE-430:
-----------------------------------------

I've been doing some thinking about the 'right' approach for handling exceptions and errors
in the map/reduce tasks and did bounce some of these through Chris too:

# Every code path in the tasks' should propagate the exception/error upwards after doing any
necessary clean-up in it's own components and sub-components
# We should distinguish between user errors (OOM, IOException etc.) and systemic errors (FSError,
ChecksumError etc.) and define just two methods on the TaskUmbilicalProtocol: userError and
systemError. In future these should be used to _blacklist_ nodes only on 'systemError', not
on 'userError'.
# Child.java:main should be the only place we call the methods on TaskUmbilicalProtocol to
inform the parent TaskTracker about errors. It should unwrap the caught exception and 
# All threads (shuffle copier threads, merger threads, sort/spill threads etc.) should catch
Throwable and save the exception for the 'main' thread to examine. The 'main' thread should
examine these at all appropriate places and abort correctly.
# We should _never_ *rethrow* exceptions from the 'main' threads - rather we should 'wrap'
them in appropriate exceptions and throw them with the right *initCause*.  This is so that
we don't lose the original stack traces.
# We should strive to use the same 'exception' types for the 'wrapper exceptions' whenever
the exception is part of the signature e.g. IOException for map/reduce in the old api and
IOException and InterruptedException for map/reduce in the new api (it is highly unfortunate
that the RPC layer wraps InterruptedException in an IOException today! :( ). This is very
important since the application writer might be relying on the 'right' exception for his specific
error-handling needs. Thus we should wrap IOException/InterruptedException in an IOException
and other Exceptions/Errors in a RuntimeException.

Thoughts?

> Task stuck in cleanup with OutOfMemoryErrors
> --------------------------------------------
>
>                 Key: MAPREDUCE-430
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-430
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amar Kamat
>             Fix For: 0.20.1
>
>         Attachments: MAPREDUCE-430-v1.11.patch, MAPREDUCE-430-v1.12-branch-0.20.patch,
MAPREDUCE-430-v1.12.patch, MAPREDUCE-430-v1.6-branch-0.20.patch, MAPREDUCE-430-v1.6.patch,
MAPREDUCE-430-v1.7.patch, MAPREDUCE-430-v1.8.patch
>
>
> Obesrved a task with OutOfMemory error, stuck in cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message