hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Distribution of native executables and data for YARN-based execution
Date Fri, 17 May 2013 17:08:06 GMT

I have a little bit of conflict of interest given I worked on Hadoop YARN all time but..

I have worked on torque/condor based resource management systems too. There are many advantages
of working on top of YARN, a couple that should be specifically relevant here:
 - MR and non MR all on same cluster (there are a few not-so-ready MR implementations on existing
schedulers but with lots of limitations)
 - Data locality feature that is native in Hadoop YARN and hard to simulate in other schedulers
(we have experience trying this in the past)
 - Elastic resource managements - jobs can grow and shrink elastically

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On May 17, 2013, at 7:20 AM, Tim St Clair wrote:

> Hi John - 
> 
> If you are doing extensive levels of non-MR C-style batch, you may be better served to
look at myriad universes of existing schedulers (torque, condor, etc.).  Or investigate the
space around interop (1 cluster, many schedulers).  
> 
> Either way, I recommend minimizing your dependency graph on your C-application where
possible if you are working in a heterogeneous environment. 
> 
> Cheers,
> Tim
> 
> 
> From: "John Lilley" <john.lilley@redpoint.net>
> To: user@hadoop.apache.org
> Sent: Friday, May 17, 2013 8:35:53 AM
> Subject: RE: Distribution of native executables and data for YARN-based execution
> 
> Thanks!  This sounds exactly like what I need.  PUBLIC is right.
>  
> Do you know if this works for executables as well?  Like, would there be any issue transferring
the executable bit on the file?
>  
> john
>  
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com] 
> Sent: Friday, May 17, 2013 12:56 AM
> To: user@hadoop.apache.org
> Subject: Re: Distribution of native executables and data for YARN-based execution
>  
>  
> The "local resources" you mentioned is the exact solution for this. For each LocalResource,
you also mention a LocalResourceVisibility which takes one of the three values today - PUBLIC,
PRIVATE and APPLICATON.
>  
> PUBLIC resources are downloaded only once and shared by any application running on that
node.
>  
> PRIVATE resources are downloaded only once and shared by any application run by the same
user on that node
>  
> APPLICATION resources are downloaded per application and removed after the application
finishes.
>  
> Seems like you want PUBLIC or PRIVATE.
>  
> Note that for PUBLIC resources to work, the corresponding files need to be public on
HDFS too.
>  
> Also if the remote files on HDFS are updated, these local files will be uploaded afresh
again on each node where your containers run.
>  
> HTH
>  
> Thanks,
> +Vinod Kumar Vavilapalli
> Hortonworks Inc.
> http://hortonworks.com/
>  
>  
> On May 16, 2013, at 2:21 PM, John Lilley wrote:
> 
> 
> I am attempting to distribute the execution of a C-based program onto a Hadoop cluster,
without using MapReduce.  I read that YARN can be used to schedule non-MapReduce applications
by programming to the ASM/RM interfaces.  As I understand it, eventually I get down to specifying
each sub-task via ContainerLaunchContext.setCommands().
>  
> However, the program and shared libraries need to be stored on each worker’s local
disk to run.  In addition there is a hefty data set that the application uses (say, 4GB) that
is accessed via regular open()/read() calls by a library.  I thought a decent strategy would
be to push the program+data package to a known folder in HDFS, then launch a “bootstrap”
that compared the HDFS folder version to a local folder, copying any updated files as needed
before launching the native application task.
>  
> Are there better approaches?  I notice that one can implicitly copy “local resources”
as part of the launch, but I don’t want to copy 4GB every time, only occasionally when the
application or reference data is updated.  Also, will my bootstrapper be allowed to set executable-mode
bits on the programs after they are copied?
>  
> Thanks
> John
>  
>  
> 


Mime
View raw message