hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clint Heath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
Date Thu, 19 Jul 2012 20:35:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418649#comment-13418649

Clint Heath commented on MAPREDUCE-4464:


  I'm fine with that as long as it doesn't interrupt the overall flow and process of what's
supposed to happen when a task fails.  In our case, every reduce task failed and therefore
the entire job, but I can see a situation where only one TT machine had a bad hostname and
therefore only a subset of reduce tasks would fail and the overall job may still complete.
 I just want to make sure we are informative in the logs and that the tasks are allowed to
be re-tried if applicable, etc.  I haven't thought through all the logic far enough yet to
know the ramifications of throwing an IOE right there.  Harsh and I chatted about the same
idea earlier, though.  I'll vet that out...
> Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
> -------------------------------------------------------------------------
>                 Key: MAPREDUCE-4464
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 1.0.0
>            Reporter: Clint Heath
>            Priority: Minor
>         Attachments: MAPREDUCE-4464.patch
>   Original Estimate: 1h
>  Remaining Estimate: 1h
> If DNS does not resolve hostnames properly, reduce tasks can fail with a very misleading
> as per my peer Ahmed's diagnosis:
> In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed URI, and
so host from:
> {code}
> String host = u.getHost();
> {code}
> is evaluated to null and the NullPointerException is thrown afterwards in the ConcurrentHashMap.
> I have written a patch to check for a null hostname condition when getHost is called
in the getMapCompletionEvents method and print an intelligible warning message rather than
suppressing it until later when it becomes confusing and misleading.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message