hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Akira AJISAKA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6607) .staging dir is not cleaned up if mapreduce.task.files.preserve.failedtask or mapreduce.task.files.preserve.filepattern are set
Date Wed, 10 Feb 2016 15:19:18 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140967#comment-15140967
] 

Akira AJISAKA commented on MAPREDUCE-6607:
------------------------------------------

Hi [~maysamyabandeh] and [~lewuathe], I'm thinking it is reasonable that .staging dir is not
cleaned if either of the two parameters is set. This is because there may be some failed tasks
even if the mapreduce job is succeeded.

bq. The former was supposed to keep only .staging of failed tasks
AFAIK, the files in .staging can be used for all tasks, so I'm thinking it's difficult to
search what is the .staging of the failed tasks.

By the way, now regex match is not done even if the "mapreduce.task.files.preserve.filepattern"
is set. We need to fix it.

> .staging dir is not cleaned up if mapreduce.task.files.preserve.failedtask or mapreduce.task.files.preserve.filepattern
are set
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6607
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6607
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.7.1
>            Reporter: Maysam Yabandeh
>            Assignee: Kai Sasaki
>            Priority: Minor
>         Attachments: MAPREDUCE-6607.01.patch
>
>
> if either of the following configs are set, then .staging dir is not cleaned up:
> * mapreduce.task.files.preserve.failedtask 
> * mapreduce.task.files.preserve.filepattern
> The former was supposed to keep only .staging of failed tasks and the latter was supposed
to be used only if that task name matches against the specified regular expression.
> {code}
>   protected boolean keepJobFiles(JobConf conf) {
>     return (conf.getKeepTaskFilesPattern() != null || conf
>         .getKeepFailedTaskFiles());
>   }
> {code}
> {code}
>   public void cleanupStagingDir() throws IOException {
>     /* make sure we clean the staging files */
>     String jobTempDir = null;
>     FileSystem fs = getFileSystem(getConfig());
>     try {
>       if (!keepJobFiles(new JobConf(getConfig()))) {
>         jobTempDir = getConfig().get(MRJobConfig.MAPREDUCE_JOB_DIR);
>         if (jobTempDir == null) {
>           LOG.warn("Job Staging directory is null");
>           return;
>         }
>         Path jobTempDirPath = new Path(jobTempDir);
>         LOG.info("Deleting staging directory " + FileSystem.getDefaultUri(getConfig())
+
>             " " + jobTempDir);
>         fs.delete(jobTempDirPath, true);
>       }
>     } catch(IOException io) {
>       LOG.error("Failed to cleanup staging dir " + jobTempDir, io);
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message