incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Morel <mmo...@apache.org>
Subject Re: S4 & YARN
Date Thu, 19 Jul 2012 08:37:14 GMT
Karthik, could you list the other fundamental questions that we'd need 
to address and that you are thinking about?

Regarding your comment on spawning PEs at runtime, note that there is no 
need to ask for more resources to spawn more PE instances: instances are 
spread across existing S4 nodes.
Also note that dynamic load balancing would reallocate existing 
instances, and we'd ask the resource manager for more resources in that 
case, but we are not there yet.

Thanks,

Matthieu

On 7/18/12 7:08 PM, Karthik Kambatla wrote:
> Awesome work, Daniel. That is definitely a huge step towards the
> integration.
>
> For a complete integration, we might have to answer a few fundamental
> questions with resource allocation:
>
> Here is one of them --- In S4, we might spawn PEs at runtime to address a
> particular partition (bunch of keys). In that case, AM requests RM for
> additional resources, and containers are allocated only when they become
> available. We need to figure out how we will handle this period where we
> wait for containers.
>
> Thanks
> Karthik
>
> On Wed, Jul 18, 2012 at 9:49 AM, Daniel Gómez Ferro
> <danielgf@yahoo-inc.com>wrote:
>
>> Hi all,
>>
>> I've been playing a bit with YARN and I think the integration with S4
>> should be quite simple. For those unfamiliar with YARN, here's a
>> simplification of how it works (check http://hadoop.apache.org/**
>> common/docs/r0.23.0/hadoop-**yarn/hadoop-yarn-site/YARN.**html<http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html>for
an in depth view):
>>
>> * There is one global ResourceManager (RM)
>> * For each application the user submits, an ApplicationMaster (AM) is
>> created
>> * The AM requests resources (nodes) to the RM
>> * When the AM has enough resources, it launches the job
>> * Now the AM can monitor the job, the user can kill the AM, etc.
>>
>> I modified the example application DistributedShell that comes in Hadoop
>> and I was able to start s4 nodes through YARN. This simple experiment led
>> me to the following ideas on how to integrate the S4 workflow in YARN:
>>
>> * The user submits an S4 application using the S4 YARN client, specifying
>> the number of nodes it needs, the s4r location, etc.
>> * The S4 YARN client does the following:
>>    - connects to ZK and configures a cluster with the given parameters
>>    - deploys the s4r using ZK
>>    - starts the S4 ApplicationMaster
>> * The S4 ApplicationMaster does the following:
>>    - sends a request to the ResourceManager for the given number of nodes
>>    - configures the job, which is just an S4 node, specifying the cluster
>> it belongs to, the user supplied parameters, etc.
>>    - submits the job and monitors it
>>
>> Of course there is more to the YARN integration than just starting an
>> application, but I think this would be a functional enough first step.
>> Maybe having the possibility to stop a running application would be nice as
>> well.
>>
>> Do you have any thoughts on how to improve this workflow?
>>
>> Thanks!
>>
>> Daniel
>>
>> On Wed Jun 13 12:26:18 2012, Flavio Junqueira wrote:
>>
>>> I'm interested in this integration, but I need to wrap my head around the
>>> deployment model of piper first.
>>>
>>> -Flavio
>>>
>>>
>>> On Jun 13, 2012, at 5:26 AM, Arun C Murthy wrote:
>>>
>>>   Folks,
>>>>
>>>> I'd like to start a discussion around getting S4 to run within Hadoop
>>>> YARN in hadoop-2.
>>>>
>>>> Brief background: Hadoop YARN is a generic resource management framework
>>>> which aims to host multiple applications such as MapReduce, MPI etc. It
>>>> would be very beneficial to get S4 running within YARN since it would make
>>>> it much more accessible for Hadoop users to do real-time processing with
S4
>>>> in their existing clusters with almost no operation overheads - thus,
>>>> aiding S4 adoption.
>>>>
>>>> I had a brief discussion with Flavio on this topic recently and he
>>>> encouraged me to start a discussion on this list.
>>>>
>>>> I see that https://issues.apache.org/**jira/browse/S4-25<https://issues.apache.org/jira/browse/S4-25>is
already open for running S4 within YARN, but hasn't been any activity
>>>> since.
>>>>
>>>> Some more details on YARN and writing applications:
>>>> http://hadoop.apache.org/**common/docs/r2.0.0-alpha/**
>>>> hadoop-yarn/hadoop-yarn-site/**WritingYarnApplications.html<http://hadoop.apache.org/common/docs/r2.0.0-alpha/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html>
>>>>
>>>> I'm happy to help anyone interested in working on this.
>>>>
>>>> thanks,
>>>> Arun
>>>>
>>>>
>>> flavio
>>> junqueira
>>> senior research scientist
>>>
>>> fpj@yahoo-inc.com
>>> direct +34 93-183-8828
>>>
>>> avinguda diagonal 177, 8th floor, barcelona, 08018, es
>>> phone (408) 349 3300    fax (408) 349 3301
>>>
>>>
>



Mime
View raw message