hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1397) NullPointerException observed during task failures
Date Fri, 22 Jan 2010 11:17:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803668#action_12803668
] 

Amareshwari Sriramadasu commented on MAPREDUCE-1397:
----------------------------------------------------

After looking at the TaskTracker logs, we found the problem is as follows:
One of the task attempts failed to launch jvm. Finally block of JvmRunner.runChild() calls
kill(), which calls terminateTask() which also fails. Then it will sleep for configured duration
(default, 5 seconds) and then calls killTask(). Then it removes the jvmid mapping from jvmIdToRunner
map.
Meanwhile, there was a killTaskAction for the same attempt from TaskTracker. This call removes
the jvmId mapping from jvmToRunningTask. Then, it sees that JvmRunner.kill() is already called
and it goes ahead and releases slot.
As there are free slots, TaskTracker tries to launch a task and finds the JvmManager in inconsistent
state, since the jvm is not yet removed from jvmIdToRunner map. When it tries to find the
details through getDetails(), it gets NullPointerException since jvmToRunningTask does not
have an entry for the same.

I think JvmRunner.kill() should not do a back call to JvmManager for removing jvmid mapping
from jvmIdToRunner map. The removal should be done by the callers of kill(). i.e. killJvm(),
stop() and reapJvm(). JvmRunner.runChild() already does from UpdateOnJvmExit(), in next method
call after kill().
Thoughts?



> NullPointerException observed during task failures
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1397
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1397
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.1
>            Reporter: Ramya R
>            Assignee: Amareshwari Sriramadasu
>            Priority: Minor
>             Fix For: 0.20.2
>
>
> In an environment where many jobs are killed simultaneously, NPEs are observed in the
TT/JT logs when a task fails. The situation is aggravated when the taskcontroller.cfg is not
configured properly. Below is the exception obtained:
> {noformat}
> INFO org.apache.hadoop.mapred.TaskInProgress: Error from <attempt_ID>:
> java.lang.Throwable: Child Error
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:529)
> Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.getDetails(JvmManager.java:329)
>         at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.reapJvm(JvmManager.java:315)
>         at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.access$000(JvmManager.java:146)
>         at org.apache.hadoop.mapred.JvmManager.launchJvm(JvmManager.java:109)
>         at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:502)
>  {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message