hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clint Heath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4464) Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
Date Thu, 19 Jul 2012 20:25:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13418643#comment-13418643
] 

Clint Heath commented on MAPREDUCE-4464:
----------------------------------------

Sorry, I should have supplied the exception that we encountered when this issue happened.
 As it turned out, the host names in the cluster all had illegal DNS characters in them (the
underscore "_"), so when the getHost() call was made, null was returned and we saw the following.

Mappers get about 80% complete when the reducers all begin to throw the following exceptions
and then die almost immediately...eventually the whole job dies:

{noformat}
2012-06-26 15:56:02,326 FATAL org.apache.hadoop.mapred.Task: attempt_201206251823_0004_r_000036_1
GetMapEventsThread Ignoring exception : java.lang.NullPointerException
    at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2835)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2756)

2012-06-26 15:56:02,356 FATAL org.apache.hadoop.mapred.Task: attempt_201206251823_0004_r_000036_1
GetMapEventsThread Ignoring exception : org.apache.hadoop.ipc.RemoteException: java.io.IOException:
JvmValidate Failed. Ignoring request from task: attempt_201206251823_0004_r_000036_1, with
JvmId: jvm_201206251823_0004_r_-396118293
    at org.apache.hadoop.mapred.TaskTracker.validateJVM(TaskTracker.java:3468)
    at org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:3731)
    at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
    at $Proxy0.getMapCompletionEvents(Unknown Source)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2798)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2756)

2012-06-26 15:56:02,361 FATAL org.apache.hadoop.mapred.Task: Failed to contact the tasktracker
org.apache.hadoop.ipc.RemoteException: java.io.IOException: JvmValidate Failed. Ignoring request
from task: attempt_201206251823_0004_r_000036_1, with JvmId: jvm_201206251823_0004_r_-396118293
    at org.apache.hadoop.mapred.TaskTracker.validateJVM(TaskTracker.java:3468)
    at org.apache.hadoop.mapred.TaskTracker.fatalError(TaskTracker.java:3714)
    at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)
    at org.apache.hadoop.ipc.Client.call(Client.java:1107)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
    at $Proxy0.fatalError(Unknown Source)
    at org.apache.hadoop.mapred.Task.reportFatalError(Task.java:294)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2781)
{noformat}
                
> Reduce tasks failing with NullPointerException in ConcurrentHashMap.get()
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4464
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4464
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 1.0.0
>            Reporter: Clint Heath
>            Priority: Minor
>         Attachments: MAPREDUCE-4464.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> If DNS does not resolve hostnames properly, reduce tasks can fail with a very misleading
exception.
> as per my peer Ahmed's diagnosis:
> In ReduceTask, it seems that event.getTaskTrackerHttp() returns a malformed URI, and
so host from:
> {code}
> String host = u.getHost();
> {code}
> is evaluated to null and the NullPointerException is thrown afterwards in the ConcurrentHashMap.
> I have written a patch to check for a null hostname condition when getHost is called
in the getMapCompletionEvents method and print an intelligible warning message rather than
suppressing it until later when it becomes confusing and misleading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message