hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haibo Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt
Date Tue, 16 Jan 2018 18:40:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327558#comment-16327558
] 

Haibo Chen commented on MAPREDUCE-6984:
---------------------------------------

Thanks [~grepas] for the updated patch! A few more comments:
 # In cleanUpPreviousAttemptOutput(ApplicationAttemptId appAttemptId), we essentially clean
up the whole job rather than individual job attempt. If there are previously more than one
attempts, only the first job clean up will succeed. How about we rename it to CleanupPreviousJobOutput
and call it only once in  cleanUpPreviousAttemptOutput()?
 # We should probably pass FAILED to the abortJob() call.
 # For the unit test, we can do things similar in TestRecovery to simulate multiple job attempts,
which I think is more readable.

> MR AM to clean up temporary files from previous attempt
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-6984
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: applicationmaster
>    Affects Versions: 3.0.0-beta1
>            Reporter: Gergo Repas
>            Assignee: Gergo Repas
>            Priority: Major
>         Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, MAPREDUCE-6984.003.patch,
MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, MAPREDUCE-6984.006.patch
>
>
> When the MR AM restarts, the &#123;outputDir&#125;/_temporary/&#123;appAttemptNumber&#125;
directory remains on HDFS, even though this directory is not used during the next attempt
if the restart has been done without recovery. So if recovery is not used for the AM restart,
then the deletion of this directory can be done earlier (at the start of the next attempt).
The benefit is that more free HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message