hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3041) Within a task, the value ofJobConf.getOutputPath() method is modified
Date Tue, 01 Apr 2008 04:58:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584003#action_12584003
] 

Alejandro Abdelnur commented on HADOOP-3041:
--------------------------------------------

Our applications, which which are running on a previous hadoop version, when migrating to
0.16.0+ are failing because we assumed the returned value was the path of output part without
any temporary stuff in it. So we are broken as well.

As the Hadoop API gets refined changes like this will break things, for example FileSystem
listPaths() now returns NULL instead an empty array when the dir does not exist. 

It is kind of painful but I would not deprecating methods with the right name because they
were returning incorrect data.

IMO the right thing to do is to have the getOutputPath() with the configured value and getWorkingOutputPath()
with the temporary dir.


> Within a task, the value ofJobConf.getOutputPath() method is modified
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-3041
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3041
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.1
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.17.0
>
>         Attachments: patch-3041-0.16.2.txt, patch-3041.txt, patch-3041.txt, patch-3041.txt,
patch-3041.txt
>
>
> Until 0.16.0 the value of the getOutputPath() method, if queried within a task, pointed
to the part file assigned to the task. 
> For example: /user/foo/myoutput/part_00000
> In 0.16.1, now it returns an internal hadoop for the task output temporary location.
> For the above example: /user/foo/myoutput/_temporary/part_00000
> This change breaks applications that use the getOutputPath() to compute other directories.
> IMO, this has always being broken, Hadoop should not change the values of properties
injected by the client, instead it should use private properties or internal helper methods.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message