hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-913) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker
Date Wed, 26 Aug 2009 11:32:01 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747891#action_12747891
] 

Vinod K V commented on MAPREDUCE-913:
-------------------------------------


The original cause for this is a job whose DistributedCache files are modified on HDFS while
the job is still running and tasks are still being assigned. (NOTE: The line numbers DO NOT
correspond to the trunk, but the trace should give an idea.)

{code}
2009-08-25 19:53:48,831 FATAL org.apache.hadoop.filecache.DistributedCache: File: hdfs://<HDFS_HOST>:<port>/user/a/b/c/distributed_data/distributed_file#distributed_file
has changed on HDFS since job started
2009-08-25 19:53:48,832 WARN org.apache.hadoop.mapred.TaskRunner: attempt_200908191538_10587_r_000000_1Child
Error
java.io.IOException: File: hdfs://<HDFS_HOST>:<port>/user/a/b/c/distributed_data/distributed_file#distributed_file
has changed on HDFS since job started
        at org.apache.hadoop.filecache.DistributedCache.ifExistsAndFresh(DistributedCache.java:485)
        at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:356)
        at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:205)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:173)
{code}

A little time after this, the TaskRunner thread for this task crashes with the following in
tasktracker's out file:
{code}
Exception in thread "Thread-89595" java.lang.NullPointerException
        at org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:412)
        at org.apache.hadoop.fs.FileUtil.makeShellPath(FileUtil.java:396)
        at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.taskFinished(TaskTracker.java:2166)
        at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.reportTaskFinished(TaskTracker.java:2091)
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:496)
{code}

The following also appears in the TaskTracker's log file
{code}
2009-08-25 19:53:51,838 ERROR org.apache.hadoop.mapred.TaskLog: getTaskLogFileDetail threw
an exception java.io.FileNotFoundException: /hadoop/logs//mapred/userlogs
attempt_200908191538_10587_r_000000_1/log.index (No such file or directory)
{code}

Once this happens with a job, this particular slot on this TaskTracker is no longer usable
as the slot could not be successfully released according to the code paths. And all the further
tasks that are assigned to this slot hang in an UNINITIALIZED state.

> TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung
TaskTracker
> ------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-913
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-913
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>            Reporter: Vinod K V
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message