hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arkady Borkovsky <ark...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-673) the task execution environment should have a current working directory that is task specific
Date Sat, 18 Nov 2006 01:02:40 GMT
+1  on Richard's comments
+1  on Dick's (MR should symlink the files that belong to the task into  
the task running directory)
-10 on ../work

On Nov 17, 2006, at 4:25 PM, Richard Kasperski (JIRA) wrote:

>     [  
> http://issues.apache.org/jira/browse/HADOOP-673? 
> page=comments#action_12450936 ]
>
> Richard Kasperski commented on HADOOP-673:
> ------------------------------------------
>
> I think that it is very important that there be a way for an  
> application to run in a sandbox <local>/jobcache/<jobid>/<taskid> 
 
> that contains the contents of the jar file and is the current working  
> directory. How one accomplishes this is an implementation detail. If  
> the only way to do it is to unjar the archive more than once then I  
> guess that would have to be the solution. This saves the potential for  
> a lot of grief. No shared files are ever modified because they aren't  
> actually shared. This causes more unpacking of jars but I don't really  
> see that as a problem. How the jar's are copied to a node and the   
> subsequent reuse of the jar is important.
>
> That is the sandbox. Then there is the shared sandbox which is also  
> two different instance to memory map a file and pay a single cost.  
> This is best handled by either softlinks or hardlinks under  
> unix/linux.
>
> Why do I think these are important models? Most of the programs that I  
> write and that I use run out of the current directory and expect all  
> of the their configuration files/resources can be read from there. For  
> programs that have more sophisticated models of deployment the models  
> above are still ok. For the simpler programs the proposed external  
> repository doesn't work.
>
> OTOH I can always run a script that will hard link the files from the  
> directory where the jar to my current working directory. I just don't  
> believe that this should be forced on the users. Having the users do  
> system'ish things is potentially dangerous.
>
> Even more restrictive then the above sandbox would be one in which the  
> application is run chroot'd. That way there is no way it could muck  
> with anything system like on the nodes. This is an important  
> consideration when one lets arbitrary programs to be run.
>
>> the task execution environment should have a current working  
>> directory that is task specific
>> ---------------------------------------------------------------------- 
>> ----------------------
>>
>>                 Key: HADOOP-673
>>                 URL: http://issues.apache.org/jira/browse/HADOOP-673
>>             Project: Hadoop
>>          Issue Type: Bug
>>          Components: mapred
>>    Affects Versions: 0.7.2
>>            Reporter: Owen O'Malley
>>         Assigned To: Mahadev konar
>>             Fix For: 0.9.0
>>
>>
>> The tasks should be run in a work directory that is specific to a  
>> single task. In particular, I'd suggest using the  
>> <local>/jobcache/<jobid>/<taskid> as the current working directory.
>
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:  
> http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:  
> http://www.atlassian.com/software/jira
>
>


Mime
View raw message