hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mitesh Singh Jat (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3926) No information of unfinished map task in Job History, if all attempts of another map task fail.
Date Tue, 28 Feb 2012 05:46:49 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mitesh Singh Jat updated MAPREDUCE-3926:
----------------------------------------

    Description: 
No information of unfinished map task in Job History, if all attempts of another map task
fail.

For example, 
1. The first map task's first attempt m_000000_0 was making progress

2. The second map task failed 4 times, before completion of first map task attempt.

3. Hence, a job cleanup task was launched and completed, before completion of first map task
attempt.

4. After job cleanup task, runningMapCache is cleaned
{noformat}
completedTask() -> jobComplete() -> garbageCollect() ->  this.runningMapCache = null;
           |-----> retireMap() -> if (runningMapCache == null) "Running cache for maps
missing!! Job details are missing."
{noformat}

5. Hence, "Running cache for maps missing!! Job details are missing." error comes
(from retireMap() which is called after jobComplete() ) and no information is
added further to Job History. Therefore, first map task's information is
missing from Job History page.


I have created a sample streaming MR job, to reproduce this issue.

{code:title=mapper.sh}
#!/bin/bash
read line
if [[ "$line" == "sleep" ]]
then
    for i in 1 2 3
    do
        echo "Sleeping" >&2
        sleep 5
    done
    exit 0
else
    echo "Exiting" >&2
    exit -1
fi
{code}

Input file: in1.txt is for long running map task (here first map task)
{code:title=/user/mitesh/input/in1.txt}
sleep
{code}

Input file: in2.txt is for failing map task (here second map task)
{code:title=/user/mitesh/input/in2.txt}
exit
{code}


Running the sample streaming MR job.
{noformat}
$ hadoop fs -rmr -skipTrash xyz
$ hadoop jar $HADOOP_INSTALL/hadoop-streaming.jar -Dmapred.map.max.attempts=2 -Dmapred.min.split.size=7
-Dmapred.map.tasks=2 -mapper "mapper.sh" -file mapper.sh -reducer NONE -input /user/mitesh/input/in1.txt
-input /user/mitesh/input/in2.txt -output xyz
{noformat}

Job History web UI
{noformat}
Hadoop Job job_201201310454_542302 on History Viewer
User: mitesh
JobName: streamjob7439640883203077520.jar
JobConf: hdfs://nn:port/user/mitesh/.staging/job_201201310454_542302/job.xml
Job-ACLs:
    mapreduce.job.acl-view-job: No users are allowed
    mapreduce.job.acl-modify-job: No users are allowed
Submitted At: 27-Feb-2012 12:56:02
Launched At: 27-Feb-2012 12:56:11 (8sec)
Finished At: 27-Feb-2012 12:56:31 (20sec)
Status: FAILED
Failure Info: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201201310454_542302_m_000001
Analyse This Job
Kind	Total Tasks(successful+failed+killed)	Successful tasks	Failed tasks	Killed tasks	Start
Time	Finish Time
Setup 	1 	1 	0 	0 	27-Feb-2012 12:56:12 	27-Feb-2012 12:56:16 (4sec)
Map 	2 	0 	2 	0 	27-Feb-2012 12:56:16 	27-Feb-2012 12:56:26 (10sec)
Reduce 	0 	0 	0 	0 		
Cleanup 	1 	1 	0 	0 	27-Feb-2012 12:56:26 	27-Feb-2012 12:56:31 (4sec)
{noformat}

Above it shows, only 2 failed tasks (belong to second map task).
Only from JT logs, the task tracker of first map task can be found.

  was:
No information of unfinished map task in Job History, if all attempts of another map task
fail.

For example, 
1. The first map task's first attempt m_000000_0 was making progress

2. The second map task failed 4 times, before completion of first map task attempt.

3. Hence, a job cleanup task was launched and completed, before completion of first map task
attempt.

4. After job cleanup task, runningMapCache is cleaned
{noformat}
completedTask() -> jobComplete() -> garbageCollect() ->  this.runningMapCache = null;
           |-----> retireMap() -> if (runningMapCache == null) "Running cache for maps
missing!! Job details are missing."
{noformat}

5. Hence, "Running cache for maps missing!! Job details are missing." error comes
(from retireMap() which is called after jobComplete() ) and no information is
added further to Job History. Therefore, first map task's information is
missing from Job History page.


I have created a sample streaming MR job, to reproduce this issue.

{code:title=mapper.sh}
#!/bin/bash
read line
if [[ "$line" == "sleep" ]]
then
    for i in 1 2 3
    do
        echo "Sleeping" >&2
        sleep 5
    done
    exit 0
else
    echo "Exiting" >&2
    exit -1
fi
{code}

Input file: in1.txt is for long running map task (here first map task)
{code:title=/user/mitesh/input/in1.txt}
sleep
{code}

Input file: in2.txt is for failing map task (here second map task)
{code:title=/user/mitesh/input/in2.txt}
exit
{code}


Running the sample streaming MR job.
{noformat}
$ hadoop fs -rmr -skipTrash xyz
$ hadoop fs -jar $HADOOP_INSTALL/hadoop-streaming.jar -Dmapred.map.max.attempts=2 -Dmapred.min.split.size=7
-Dmapred.map.tasks=2 -mapper "mapper.sh" -file mapper.sh -reducer NONE -input /user/mitesh/input/in1.txt
-input /user/mitesh/input/in2.txt -output xyz
{noformat}

Job History web UI
{noformat}
Hadoop Job job_201201310454_542302 on History Viewer
User: mitesh
JobName: streamjob7439640883203077520.jar
JobConf: hdfs://nn:port/user/mitesh/.staging/job_201201310454_542302/job.xml
Job-ACLs:
    mapreduce.job.acl-view-job: No users are allowed
    mapreduce.job.acl-modify-job: No users are allowed
Submitted At: 27-Feb-2012 12:56:02
Launched At: 27-Feb-2012 12:56:11 (8sec)
Finished At: 27-Feb-2012 12:56:31 (20sec)
Status: FAILED
Failure Info: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201201310454_542302_m_000001
Analyse This Job
Kind	Total Tasks(successful+failed+killed)	Successful tasks	Failed tasks	Killed tasks	Start
Time	Finish Time
Setup 	1 	1 	0 	0 	27-Feb-2012 12:56:12 	27-Feb-2012 12:56:16 (4sec)
Map 	2 	0 	2 	0 	27-Feb-2012 12:56:16 	27-Feb-2012 12:56:26 (10sec)
Reduce 	0 	0 	0 	0 		
Cleanup 	1 	1 	0 	0 	27-Feb-2012 12:56:26 	27-Feb-2012 12:56:31 (4sec)
{noformat}

Above it shows, only 2 failed tasks (belong to second map task).
Only from JT logs, the task tracker of first map task can be found.

    
> No information of unfinished map task in Job History, if all attempts of another map
task fail.
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3926
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3926
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0
>            Reporter: Mitesh Singh Jat
>            Priority: Minor
>
> No information of unfinished map task in Job History, if all attempts of another map
task fail.
> For example, 
> 1. The first map task's first attempt m_000000_0 was making progress
> 2. The second map task failed 4 times, before completion of first map task attempt.
> 3. Hence, a job cleanup task was launched and completed, before completion of first map
task attempt.
> 4. After job cleanup task, runningMapCache is cleaned
> {noformat}
> completedTask() -> jobComplete() -> garbageCollect() ->  this.runningMapCache
= null;
>            |-----> retireMap() -> if (runningMapCache == null) "Running cache for
maps missing!! Job details are missing."
> {noformat}
> 5. Hence, "Running cache for maps missing!! Job details are missing." error comes
> (from retireMap() which is called after jobComplete() ) and no information is
> added further to Job History. Therefore, first map task's information is
> missing from Job History page.
> I have created a sample streaming MR job, to reproduce this issue.
> {code:title=mapper.sh}
> #!/bin/bash
> read line
> if [[ "$line" == "sleep" ]]
> then
>     for i in 1 2 3
>     do
>         echo "Sleeping" >&2
>         sleep 5
>     done
>     exit 0
> else
>     echo "Exiting" >&2
>     exit -1
> fi
> {code}
> Input file: in1.txt is for long running map task (here first map task)
> {code:title=/user/mitesh/input/in1.txt}
> sleep
> {code}
> Input file: in2.txt is for failing map task (here second map task)
> {code:title=/user/mitesh/input/in2.txt}
> exit
> {code}
> Running the sample streaming MR job.
> {noformat}
> $ hadoop fs -rmr -skipTrash xyz
> $ hadoop jar $HADOOP_INSTALL/hadoop-streaming.jar -Dmapred.map.max.attempts=2 -Dmapred.min.split.size=7
-Dmapred.map.tasks=2 -mapper "mapper.sh" -file mapper.sh -reducer NONE -input /user/mitesh/input/in1.txt
-input /user/mitesh/input/in2.txt -output xyz
> {noformat}
> Job History web UI
> {noformat}
> Hadoop Job job_201201310454_542302 on History Viewer
> User: mitesh
> JobName: streamjob7439640883203077520.jar
> JobConf: hdfs://nn:port/user/mitesh/.staging/job_201201310454_542302/job.xml
> Job-ACLs:
>     mapreduce.job.acl-view-job: No users are allowed
>     mapreduce.job.acl-modify-job: No users are allowed
> Submitted At: 27-Feb-2012 12:56:02
> Launched At: 27-Feb-2012 12:56:11 (8sec)
> Finished At: 27-Feb-2012 12:56:31 (20sec)
> Status: FAILED
> Failure Info: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201201310454_542302_m_000001
> Analyse This Job
> Kind	Total Tasks(successful+failed+killed)	Successful tasks	Failed tasks	Killed tasks
Start Time	Finish Time
> Setup 	1 	1 	0 	0 	27-Feb-2012 12:56:12 	27-Feb-2012 12:56:16 (4sec)
> Map 	2 	0 	2 	0 	27-Feb-2012 12:56:16 	27-Feb-2012 12:56:26 (10sec)
> Reduce 	0 	0 	0 	0 		
> Cleanup 	1 	1 	0 	0 	27-Feb-2012 12:56:26 	27-Feb-2012 12:56:31 (4sec)
> {noformat}
> Above it shows, only 2 failed tasks (belong to second map task).
> Only from JT logs, the task tracker of first map task can be found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message