incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: S4 & YARN
Date Fri, 20 Jul 2012 14:46:39 GMT
Daniel,

>  * The user submits an S4 application using the S4 YARN client, specifying the number
of nodes it needs, the s4r location, etc.
> * The S4 YARN client does the following:
>  - connects to ZK and configures a cluster with the given parameters
>  - deploys the s4r using ZK
>  - starts the S4 ApplicationMaster
> * The S4 ApplicationMaster does the following:
>  - sends a request to the ResourceManager for the given number of nodes
>  - configures the job, which is just an S4 node, specifying the cluster it belongs to,
the user supplied parameters, etc.
>  - submits the job and monitors it

 I'm a S4 neophyte, but the flow looks like a great start! I'm happy to help with details
of YARN if you would like.

 One question: What does 'configure a cluster' imply? Do you see that done for every S4 application
via the S4 YARN client? Would that be a big overhead for every S4 application?

Maybe a future enhancement would be to share the same cluster across multiple S4 applications
by breaking apart a YARN S4-DEPLOY application (client+appmaster) to first set up a S4 cluster
and then another YARN S4 application to just run S4 apps in the already deployed cluster?
Thoughts? Am I hopelessly off-base? *smile*

thanks,
Arun

On Jul 18, 2012, at 9:49 AM, Daniel Gómez Ferro wrote:

> Hi all,
> 
> I've been playing a bit with YARN and I think the integration with S4 should be quite
simple. For those unfamiliar with YARN, here's a simplification of how it works (check http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html
for an in depth view):
> 
> * There is one global ResourceManager (RM)
> * For each application the user submits, an ApplicationMaster (AM) is created
> * The AM requests resources (nodes) to the RM
> * When the AM has enough resources, it launches the job
> * Now the AM can monitor the job, the user can kill the AM, etc.
> 
> I modified the example application DistributedShell that comes in Hadoop and I was able
to start s4 nodes through YARN. This simple experiment led me to the following ideas on how
to integrate the S4 workflow in YARN:
> 
> * The user submits an S4 application using the S4 YARN client, specifying the number
of nodes it needs, the s4r location, etc.
> * The S4 YARN client does the following:
>  - connects to ZK and configures a cluster with the given parameters
>  - deploys the s4r using ZK
>  - starts the S4 ApplicationMaster
> * The S4 ApplicationMaster does the following:
>  - sends a request to the ResourceManager for the given number of nodes
>  - configures the job, which is just an S4 node, specifying the cluster it belongs to,
the user supplied parameters, etc.
>  - submits the job and monitors it
> 
> Of course there is more to the YARN integration than just starting an application, but
I think this would be a functional enough first step. Maybe having the possibility to stop
a running application would be nice as well.
> 
> Do you have any thoughts on how to improve this workflow?
> 
> Thanks!
> 
> Daniel
> 
> On Wed Jun 13 12:26:18 2012, Flavio Junqueira wrote:
>> I'm interested in this integration, but I need to wrap my head around the deployment
model of piper first.
>> 
>> -Flavio
>> 
>> On Jun 13, 2012, at 5:26 AM, Arun C Murthy wrote:
>> 
>>> Folks,
>>> 
>>> I'd like to start a discussion around getting S4 to run within Hadoop YARN in
hadoop-2.
>>> 
>>> Brief background: Hadoop YARN is a generic resource management framework which
aims to host multiple applications such as MapReduce, MPI etc. It would be very beneficial
to get S4 running within YARN since it would make it much more accessible for Hadoop users
to do real-time processing with S4 in their existing clusters with almost no operation overheads
- thus, aiding S4 adoption.
>>> 
>>> I had a brief discussion with Flavio on this topic recently and he encouraged
me to start a discussion on this list.
>>> 
>>> I see that https://issues.apache.org/jira/browse/S4-25 is already open for running
S4 within YARN, but hasn't been any activity since.
>>> 
>>> Some more details on YARN and writing applications:
>>> http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
>>> 
>>> I'm happy to help anyone interested in working on this.
>>> 
>>> thanks,
>>> Arun
>>> 
>> 
>> flavio
>> junqueira
>> senior research scientist
>> 
>> fpj@yahoo-inc.com
>> direct +34 93-183-8828
>> 
>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>> phone (408) 349 3300    fax (408) 349 3301
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message