hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject RE: Custom ApplicationMaster development
Date Tue, 28 May 2013 22:15:02 GMT
Hitesh,

Yes that is it exactly.  We want to implement distributed algorithms where some data should
persist in a scope beyond "task".  We could write it to HDFS, but that is a high overhead
for non-persistent data; at least that is what I am told on this forum.

Is it possible/desirable to get the "mapreduce shuffle service" to serve up data files for
me, or is it bound too tightly to MR?

john

-----Original Message-----
From: Hitesh Shah [mailto:hitesh@hortonworks.com] 
Sent: Friday, May 24, 2013 3:35 PM
To: yarn-dev@hadoop.apache.org
Subject: Re: Custom ApplicationMaster development

Hi John,

Yes - you probably could. 

I don't  know of anyone that has written any other auxiliary service till date so if you come
across anything lacking in the handling/support of aux services, please do file feature-request/bug
jiras.

For the application that you mentioned, I am assuming you are looking to build some form of
a data 'caching' service that can store a job's output to be used by subsequent jobs? 

-- Hitesh

On May 24, 2013, at 1:33 PM, John Lilley wrote:

> Hitesh,
> 
> Regarding your comments:
>  - the files are served by an auxiliary service ( mapreduce shuffle service ) running
within the NodeManager. 
>  - The NM needs to be configured to tell it which aux services to start up.
> 
> Does this mean that I could in theory write an auxiliary service, perhaps modeled after
the mapreduce shuffle service, to handle such node-level tasks as serving up files?  What
I am trying to understand is whether my application can perform similar actions to MapReduce.
 I am not trying to replace MapReduce, however the ability to perform equivalent operations
would be very useful to our application.  For example, there are transitive closure algorithms
that can be written by iterative MapReduce jobs, but which can potentially be much more efficient
if they are able to avoid landing intermediate results on HDFS.
> 
> Thanks
> John
> 
> 
> -----Original Message-----
> From: Hitesh Shah [mailto:hitesh@hortonworks.com]
> Sent: Thursday, May 23, 2013 5:10 PM
> To: yarn-dev@hadoop.apache.org
> Subject: Re: Custom ApplicationMaster development
> 
> Hello John
> 
> To add to Chris' email:
> 
> Do take a look at http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
>   - this is probably a bit of date. 
>   - the actual source code of distributed-shell in the source tree would be the best
guideline to follow after taking a brief look at the link above.
> 
> Compatibility
>  - 0.23 and 2.0 are similar to a large extent but there are differences - not sure if
it is possible to code for compatibility.
>  - To get apis into a relatively stable state, a lot of changes have 
> gone in since 2.0.4 was released
> 
> Task output files
>  - the files are served by an auxiliary service ( mapreduce shuffle service ) running
within the NodeManager. 
>  - The NM needs to be configured to tell it which aux services to start up.
>  - The protocols support some level of information passing via the service data constructs.

>  - the service is notified when an application completes such that it 
> can be used to delete data if needed
> 
> -- Hitesh
> 
> 
> On May 23, 2013, at 3:45 PM, John Lilley wrote:
> 
>> I am getting started with development of a custom ApplicationMaster and I didn't
think that the user@ list was quite the right place for it.  Apologies if this list isn't
the right place either.  Some of my questions are really newbie, like:
>> 
>> *         Is there an FAQ for non-MR YARN development?
>> 
>> *         Is there an FAQ for configuring/building/running Hadoop from source, preferably
in Eclipse?
>> 
>> *         What is the recommended configuration/environment for development of a
YARN app?  I would like to use Eclipse under Windows if that even makes any sense.
>> 
>> *         Would you start with a Hadoop release or build from version control?
>> 
>> *         Is it possible to code for compatibility between 2.0 and 0.23?
>> 
>> *         Is there an ApplicationMaster example that can be used as a starting point?
>> I also have some more in-depth questions:
>> 
>> *         When a MapReduce task creates its output files and makes them available
over HTTP, is it the NodeManager that serves them up?  If my YARN task wants to do something
similar, how does it tell the NodeManager?  How are the files removed later?
>> 
>> *         Is it possible to install objects or services that run as peers of the
NodeManager as opposed to tasks?  Are there any recommended per-node patterns as opposed to
per-task patterns?
>> 
>> Thanks
>> John
>> 
> 


Mime
View raw message