incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthieu Morel (JIRA)" <>
Subject [jira] [Commented] (S4-25) Write S4 Application Master to deploy S4 in Yarn
Date Thu, 25 Oct 2012 19:07:13 GMT


Matthieu Morel commented on S4-25:

I uploaded a patch in branch S4-25 (here;a=shortlog;h=refs/heads/S4-25),
and added some documentation here :

The approach is to preserve S4 deployment model (coordination through ZooKeeper, application
loading logic in the S4 nodes), and make a projection on YARN in order to start S4 nodes.

The patch depends on hadoop-2.0.2-alpha, the latest release.

The patch adds a new subproject, s4-yarn and provides the s4-yarn command to deploy S4 applications.
You can combine S4 parameters as well as YARN specific parameters (num_containers, queue,
user etc...)

I also added a regression test that uses MiniYARNCluster and MiniDFSCluster.

Pending issues:
* It's not clear to me how to stop an application. The {{YarnClientImpl#killApplication}}
method seems to kill the application master, but not the processes launched by this application
* I could not figure how to add yarn test dependencies. That may be a gradle issue, or the
way hadoop-2.0.2-alpha packages are distributed on maven. Not sure. In the meantime, I added
them to a local lib/ directory of the S4 distribution

Arun: because we used a released version of Yarn, we used the raw API, not YARN-103
> Write S4 Application Master to deploy S4 in Yarn
> ------------------------------------------------
>                 Key: S4-25
>                 URL:
>             Project: Apache S4
>          Issue Type: New Feature
>            Reporter: J Mohamed Zahoor
>             Fix For: 0.6
>         Attachments: S4-ApplicationMaster.diff, S4-Client.diff, S4-Constants.diff, S4-YARN-1.patch
> On the lines of s4PigWrapper, write a s4 application master to host s4 piper inside Hadoop
Yarn. This could be useful not only for reading data stored in hadoop ( to build or train
a model)... But we could make use of the resource manager to deploy s4 instances in remote
machine and monitor them. In short, we could make use of most of the resource management ,
scheduling and other good stuff in Yarn.
> - Yarn is useful to deploy and launch s4 instances.
> - It still requires deploying node managers on each box which means it will
> be useful if one is running more than one s4 process on a node.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message