hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amareshwari Sriramadasu (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HADOOP-3598) Map-Reduce framework needlessly creates temporary _${taskid} directories for Maps
Date Fri, 20 Jun 2008 09:37:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606683#action_12606683
] 

amareshwari edited comment on HADOOP-3598 at 6/20/08 2:36 AM:
--------------------------------------------------------------------------

One comment.. 
Creation of _temporary/\_<taskid> should not throw exception if mkdirs fails, but the
directory already exists. Because the directory could be already created by creating a side-file
or from an earlier call to getTaskOutputPath().
You can add fs.exists(taskTmpDir) check before throwing exception

Otherthan that, with documentation fixed for getTaskOutputPath patch looks good.

      was (Author: amareshwari):
    One comment.. 
Creation of _temporary/_<taskid> should throw exception if mkdirs fails and also if
the directory doesnt exist. Because the directory could be created by creating a side-file
or from an earlier call to getTaskOutputPath().

Otherthan that, with documentation fixed for getTaskOutputPath patch looks good.
  
> Map-Reduce framework needlessly creates temporary _${taskid} directories for Maps
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-3598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3598
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3598_0_20080619.patch, HADOOP-3598_0_20080619.patch, HADOOP-3598_0_20080619.patch
>
>
> The staging directory for task-outputs (i.e. ${mapred.out.dir}/_temporary/_${taskid})
should only be created when Maps produce output on HDFS, which usually isn't the case. This
plays very badly with HDFS quotas and may lead to thousands of temp names in the FS namespace,
there-by overhauling the quotas. IAC, it isn't good to needlessly create these directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message