hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Distribution of native executables and data for YARN-based execution
Date Fri, 17 May 2013 17:03:11 GMT

In addition, YARN also support local resource types: ARCHIVE, PATTERN and FILE

ARCHIVE's are unpacked completely, PATTERN (should be renamed) files are unpacked to extract
only files that match that pattern.

To answer your question, we explicitly set the executable bit for all the unpacked files and
we cannot depend on the underlying OS or the (un)packing tool to retain permissions.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On May 17, 2013, at 6:35 AM, John Lilley wrote:

> Thanks!  This sounds exactly like what I need.  PUBLIC is right.
>  
> Do you know if this works for executables as well?  Like, would there be any issue transferring
the executable bit on the file?
>  
> john
>  
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] 
> Sent: Friday, May 17, 2013 12:56 AM
> To: user@hadoop.apache.org
> Subject: Re: Distribution of native executables and data for YARN-based execution
>  
>  
> The "local resources" you mentioned is the exact solution for this. For each LocalResource,
you also mention a LocalResourceVisibility which takes one of the three values today - PUBLIC,
PRIVATE and APPLICATON.
>  
> PUBLIC resources are downloaded only once and shared by any application running on that
node.
>  
> PRIVATE resources are downloaded only once and shared by any application run by the same
user on that node
>  
> APPLICATION resources are downloaded per application and removed after the application
finishes.
>  
> Seems like you want PUBLIC or PRIVATE.
>  
> Note that for PUBLIC resources to work, the corresponding files need to be public on
HDFS too.
>  
> Also if the remote files on HDFS are updated, these local files will be uploaded afresh
again on each node where your containers run.
>  
> HTH
>  
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>  
>  
> On May 16, 2013, at 2:21 PM, John Lilley wrote:
> 
> 
> I am attempting to distribute the execution of a C-based program onto a Hadoop cluster,
without using MapReduce.  I read that YARN can be used to schedule non-MapReduce applications
by programming to the ASM/RM interfaces.  As I understand it, eventually I get down to specifying
each sub-task via ContainerLaunchContext.setCommands().
>  
> However, the program and shared libraries need to be stored on each worker’s local
disk to run.  In addition there is a hefty data set that the application uses (say, 4GB) that
is accessed via regular open()/read() calls by a library.  I thought a decent strategy would
be to push the program+data package to a known folder in HDFS, then launch a “bootstrap”
that compared the HDFS folder version to a local folder, copying any updated files as needed
before launching the native application task.
>  
> Are there better approaches?  I notice that one can implicitly copy “local resources”
as part of the launch, but I don’t want to copy 4GB every time, only occasionally when the
application or reference data is updated.  Also, will my bootstrapper be allowed to set executable-mode
bits on the programs after they are copied?
>  
> Thanks
> John
>  


Mime
View raw message