hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Kasperski (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-673) the task execution environment should have a current working directory that is task specific
Date Sat, 18 Nov 2006 00:25:39 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-673?page=comments#action_12450936 ] 
Richard Kasperski commented on HADOOP-673:

I think that it is very important that there be a way for an application to run in a sandbox
<local>/jobcache/<jobid>/<taskid>  that contains the contents of the jar
file and is the current working directory. How one accomplishes this is an implementation
detail. If the only way to do it is to unjar the archive more than once then I guess that
would have to be the solution. This saves the potential for a lot of grief. No shared files
are ever modified because they aren't actually shared. This causes more unpacking of jars
but I don't really see that as a problem. How the jar's are copied to a node and the  subsequent
reuse of the jar is important. 

That is the sandbox. Then there is the shared sandbox which is also two different instance
to memory map a file and pay a single cost. This is best handled by either softlinks or hardlinks
under unix/linux. 

Why do I think these are important models? Most of the programs that I write and that I use
run out of the current directory and expect all of the their configuration files/resources
can be read from there. For programs that have more sophisticated models of deployment the
models above are still ok. For the simpler programs the proposed external repository doesn't

OTOH I can always run a script that will hard link the files from the directory where the
jar to my current working directory. I just don't believe that this should be forced on the
users. Having the users do system'ish things is potentially dangerous.

Even more restrictive then the above sandbox would be one in which the application is run
chroot'd. That way there is no way it could muck with anything system like on the nodes. This
is an important consideration when one lets arbitrary programs to be run. 

> the task execution environment should have a current working directory that is task specific
> --------------------------------------------------------------------------------------------
>                 Key: HADOOP-673
>                 URL: http://issues.apache.org/jira/browse/HADOOP-673
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.7.2
>            Reporter: Owen O'Malley
>         Assigned To: Mahadev konar
>             Fix For: 0.9.0
> The tasks should be run in a work directory that is specific to a single task. In particular,
I'd suggest using the <local>/jobcache/<jobid>/<taskid> as the current working

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message