hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mac Fang (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3362) Job always stay at 'Pending' status and cannot finish several days
Date Tue, 13 Dec 2011 08:25:31 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13168220#comment-13168220
] 

Mac Fang commented on MAPREDUCE-3362:
-------------------------------------

I think the 2 scenarios are different. 

The scenario in this issue is the Map/Reduce tasks in this job are done, but the job still
stay pending. The root cause is the ConcurrentModificationException, if the exception happen,
the counter is wrong.
                
> Job always stay at 'Pending' status and cannot finish several days
> ------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3362
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3362
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, jobtracker
>    Affects Versions: 0.20.2
>            Reporter: Denny Ye
>            Priority: Critical
>              Labels: jobtracker
>
> Our jobs are always keeping at 'pending' status several days. We checked jobtracker log
and found that one task(attemp) failed due to failure to store job history to HDFS. 
> The issue begins from the business that another job remove the folder that this job is
being written with history log. In this case, there has 'ConcurrentModificationException'
at JobHistory#log(ArrayList<PrintWriter> writers, RecordTypes recordType, Keys[] keys,
String[] values, JobID id). One thread checked if there has any output error and removed output
with history folder at HDFS has been removed, another thread got 'ConcurrentModificationException'
at current 'writers' is blank.
> Unfortunately, no one want to catch this exception and it go thought to TaskTracker(it
jump over the calculating part to add 'finishedMapTask'). Then, another task(attemp) runs
from 'failedMap' successfully, but the total 'finishedMapTask' number is not the all finishedMapTask.
JobCleanupTask cannot startup and job always stay at 'pending' status.
> The root cause:
> First task(attemp) failed with exception and this task add to 'failedMap' with decrease
the 'finishedMap' counter. Next task(attemp) runs successfully and increase one for 'finishedMap'.
Due to failure the total 'finishedMap' is less that actual finishedMap counter, so the cleanup
task cannot runs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message