hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prabhu Joseph (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-6981) Map Progress is misleading for Distcp job
Date Wed, 11 Oct 2017 14:03:00 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prabhu Joseph updated MAPREDUCE-6981:
-------------------------------------
    Description: 
The Progress displayed by client when running Distcp job is misleading. The Map Progress reaches
100% earlier than the map tasks finishes. The issue reproduced by just running Distcp with
multiple huge files. 

JobImpl returns progress 1.0 when either task finishes or task progress is 1.0. The MapTask
of Distcp gets the progress from SequenceFileRecordReader which looks like updates the progress
after reading the list of files and which does not account the time taken to copy the files
into Destination.

{code}
17/10/11 13:33:29 INFO mapreduce.Job:  map 100% reduce 0%
17/10/11 13:34:47 INFO mapreduce.Job: Job job_1506610341926_0016 completed successfully
{code}

The MapTask Progress 100% is displayed at 17/10/11 13:33:29 whereas the last map task finishes
at 2017-10-11 13:34:45

{code}
2017-10-11 13:34:45,159 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1506610341926_0016_m_000002 Task Transitioned from RUNNING to SUCCEEDED
{code}

Attaching the client and application logs.

  was:
The Progress displayed by client when running Distcp job is misleading. The Map Progress reaches
100% earlier than the map tasks finishes. The issue reproduced by just running Distcp with
multiple huge files. 

JobImpl returns progress 1.0 when either task finishes or task progress is 1.0. The MapTask
of Distcp gets the progress from SequenceFileRecordReader which looks like updates the progress
after reading the list of files and which does not account the time taken to copy the files
into Destination.

{code}
17/10/11 13:33:29 INFO mapreduce.Job:  map 100% reduce 0%
17/10/11 13:34:47 INFO mapreduce.Job: Job job_1506610341926_0016 completed successfully
{code}

The MapTask Progress is displayed at 17/10/11 13:33:29 whereas the last map task finishes
at 2017-10-11 13:34:45

{code}
2017-10-11 13:34:45,159 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1506610341926_0016_m_000002 Task Transitioned from RUNNING to SUCCEEDED
{code}

Attaching the client and application logs.


> Map Progress is misleading for Distcp job
> -----------------------------------------
>
>                 Key: MAPREDUCE-6981
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6981
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 2.7.3
>            Reporter: Prabhu Joseph
>            Priority: Minor
>         Attachments: clientlog, yarnlog
>
>
> The Progress displayed by client when running Distcp job is misleading. The Map Progress
reaches 100% earlier than the map tasks finishes. The issue reproduced by just running Distcp
with multiple huge files. 
> JobImpl returns progress 1.0 when either task finishes or task progress is 1.0. The MapTask
of Distcp gets the progress from SequenceFileRecordReader which looks like updates the progress
after reading the list of files and which does not account the time taken to copy the files
into Destination.
> {code}
> 17/10/11 13:33:29 INFO mapreduce.Job:  map 100% reduce 0%
> 17/10/11 13:34:47 INFO mapreduce.Job: Job job_1506610341926_0016 completed successfully
> {code}
> The MapTask Progress 100% is displayed at 17/10/11 13:33:29 whereas the last map task
finishes at 2017-10-11 13:34:45
> {code}
> 2017-10-11 13:34:45,159 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1506610341926_0016_m_000002 Task Transitioned from RUNNING to SUCCEEDED
> {code}
> Attaching the client and application logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message