hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Kasperski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-673) the task execution environment should have a current working directory that is task specific
Date Mon, 20 Nov 2006 16:31:06 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-673?page=comments#action_12451359 ] 
            
Richard Kasperski commented on HADOOP-673:
------------------------------------------

Owen,
  There are two primary reasons. 
1. Maybe I don't have access to the source of all programs that I might want to run, or I
have the source and don't desire to make any changes. Having the data for an application be
rooted in the current directory allows many more programs to be run with no changes.
2. I think of hadoop as an infrastructure to run other progams. not as something to write
programs too. This means that I do not want to write into my progams hadoop'isms. 

Running streaming apps on hadoop should be as transparent as possible.

Files being a shared resource is an efficiency consideration not one required for proper operation.
Is there an operational requirement for the files being shared such as I need to mmap the
same file? Is this likely the way that most programs will be. 

It's not that I'm absolutely against the .. structure, just that it seems to me not to be
the most useful structure. I would contend that if you desire the .. placement you need also,
some how, provide for placement in the current working directory.

The above comments only apply to streaming. 

A more general comment, perhaps philosophical, perhaps religious.

The purpose of hadoop is to allow work to be divided and then executed with a belief of correctness
that is the same as would have if I ran it as a single job on a single machine. Sharing data
in a way that allows inadvertant/unintended manipulation of 'SHARED' data violates this. It
seems to me that this should be used when efficiency is needed and the user can take an appropriate
explicit action to enable it. 




> the task execution environment should have a current working directory that is task specific
> --------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-673
>                 URL: http://issues.apache.org/jira/browse/HADOOP-673
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.7.2
>            Reporter: Owen O'Malley
>         Assigned To: Mahadev konar
>             Fix For: 0.9.0
>
>
> The tasks should be run in a work directory that is specific to a single task. In particular,
I'd suggest using the <local>/jobcache/<jobid>/<taskid> as the current working
directory.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message