hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3926) No information of unfinished map task in Job History, if all attempts of another map task fail.
Date Mon, 27 Feb 2012 15:36:48 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217228#comment-13217228
] 

Amar Kamat commented on MAPREDUCE-3926:
---------------------------------------

Mitesh,
I guess adding this to 0.20.205 might involve a lot of change. Also, the JT has no information
about the running tasks i.e they could in fact be RUNNING, KILLED, FAILED, PENDING etc.

Note that this can happen for SUCCESSFUL jobs too. The job can still complete/finish while
the speculative tasks are running. In such cases, there is no information about the speculative
tasks logged in the job history.

This can surely be fixed in trunk.
                
> No information of unfinished map task in Job History, if all attempts of another map
task fail.
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3926
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3926
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0
>            Reporter: Mitesh Singh Jat
>            Priority: Minor
>
> No information of unfinished map task in Job History, if all attempts of another map
task fail.
> For example, 
> 1. The first map task's first attempt m_000000_0 was making progress
> 2. The second map task failed 4 times, before completion of first map task attempt.
> 3. Hence, a job cleanup task was launched and completed, before completion of first map
task attempt.
> 4. After job cleanup task, runningMapCache is cleaned
> {noformat}
> completedTask() -> jobComplete() -> garbageCollect() ->  this.runningMapCache
= null;
>            |-----> retireMap() -> if (runningMapCache == null) "Running cache for
maps missing!! Job details are missing."
> {noformat}
> 5. Hence, "Running cache for maps missing!! Job details are missing." error comes
> (from retireMap() which is called after jobComplete() ) and no information is
> added further to Job History. Therefore, first map task's information is
> missing from Job History page.
> I have created a sample streaming MR job, to reproduce this issue.
> {code:title=mapper.sh}
> #!/bin/bash
> read line
> if [[ "$line" == "sleep" ]]
> then
>     for i in 1 2 3
>     do
>         echo "Sleeping" >&2
>         sleep 5
>     done
>     exit 0
> else
>     echo "Exiting" >&2
>     exit -1
> fi
> {code}
> Input file: in1.txt is for long running map task (here first map task)
> {code:title=/user/mitesh/input/in1.txt}
> sleep
> {code}
> Input file: in2.txt is for failing map task (here second map task)
> {code:title=/user/mitesh/input/in2.txt}
> exit
> {code}
> Running the sample streaming MR job.
> {noformat}
> $ hadoop fs -rmr -skipTrash xyz
> $ hadoop fs -jar $HADOOP_INSTALL/hadoop-streaming.jar -Dmapred.map.max.attempts=2 -Dmapred.min.split.size=7
-Dmapred.map.tasks=2 -mapper "mapper.sh" -file mapper.sh -reducer NONE -input /user/mitesh/input/in1.txt
-input /user/mitesh/input/in2.txt -output xyz
> {noformat}
> Job History web UI
> {noformat}
> Hadoop Job job_201201310454_542302 on History Viewer
> User: mitesh
> JobName: streamjob7439640883203077520.jar
> JobConf: hdfs://nn:port/user/mitesh/.staging/job_201201310454_542302/job.xml
> Job-ACLs:
>     mapreduce.job.acl-view-job: No users are allowed
>     mapreduce.job.acl-modify-job: No users are allowed
> Submitted At: 27-Feb-2012 12:56:02
> Launched At: 27-Feb-2012 12:56:11 (8sec)
> Finished At: 27-Feb-2012 12:56:31 (20sec)
> Status: FAILED
> Failure Info: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201201310454_542302_m_000001
> Analyse This Job
> Kind	Total Tasks(successful+failed+killed)	Successful tasks	Failed tasks	Killed tasks
Start Time	Finish Time
> Setup 	1 	1 	0 	0 	27-Feb-2012 12:56:12 	27-Feb-2012 12:56:16 (4sec)
> Map 	2 	0 	2 	0 	27-Feb-2012 12:56:16 	27-Feb-2012 12:56:26 (10sec)
> Reduce 	0 	0 	0 	0 		
> Cleanup 	1 	1 	0 	0 	27-Feb-2012 12:56:26 	27-Feb-2012 12:56:31 (4sec)
> {noformat}
> Above it shows, only 2 failed tasks (belong to second map task).
> Only from JT logs, the task tracker of first map task can be found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message