hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3041) Within a task, the value ofJobConf.getOutputPath() method is modified
Date Tue, 18 Mar 2008 17:33:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579938#action_12579938

Devaraj Das commented on HADOOP-3041:

bq. But, using the rule of least surprise, wouldn't make more sense to have a getTaskOutputPath()
that returns the path to the part file for the current task and leave the getOutputPath()
with the user entered value?

Possibly. One thing that is of concern here is that apps potentially have been written using
the getOutputPath API (that creates side files within it).. Also, if the user really intends
to create a side file in the output directory of the job, it is slightly unintuitive IMO to
have the user invoke getTaskOutputPath. But yes I agree that getOutputPath returning the task's
output path is unintuitive as well. I wish this was clearer. I am unhappy about it too..

bq. Also the javadoc should not say 'Get the Path to the output directory for the map-reduce
job' in its one line description then.

> Within a task, the value ofJobConf.getOutputPath() method is modified
> ---------------------------------------------------------------------
>                 Key: HADOOP-3041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3041
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.1
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Priority: Blocker
>             Fix For: 0.16.2
> Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed
to the part file assigned to the task. 
> For example: /user/foo/myoutput/part_00000
> In 0.16.1, now it returns an internal hadoop for the task output temporary location.
> For the above example: /user/foo/myoutput/_temporary/part_00000
> This change breaks applications that use the getOutputPath() to compute other directories.
> IMO, this has always being broken, Hadoop should not change the values of properties
injected by the client, instead it should use private properties or internal helper methods.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message