mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: Does Mesos support Hadoop MR V2
Date Sun, 27 Jul 2014 16:59:39 GMT
So excuse my naivety in this space, but my ignorance has never really
stopped me from asking questions:

I see YARN (Yet another resource negotiator) as very similar to Mesos. I.e.
something to manage resources on a cluster of machines. So when I hear talk
of running "YARN" on Mesos it's seems very redundant indeed, and I ask
myself, what are we actually getting out of this setup?

So, going to the mapr/reduce question, I see Mapr Reduce V1 and MaprReduce
V2 like this:  Map Reduce V2 is an application that runs on YARN. I.e. if
you run a job, it creates an application master, that application master
requests resources, and the job gets run.  It differs from Map Reduce V1 is
there is no long running Job Tracker (other than the YARN Resource Manager,
but that is managing resources for all applications, not just Map Reduce
Applications).  Ok, so Mesos, why can't there be a Mesos Application that
is similar to a Map Reduce V2 Application in YARN?  Why do we need to run
YARN on Mesos? That doesn't really make sense.  Basically, for M/R V2 vs
M/R V1, the only difference is to mimic M/R V1 we need task trackers and
job trackers running as Mesos applications (which we have).  So in M/R v2,
we just need the equivalent of an application master running on Yarn,
requesting resources across the cluster.

Fundamentally, YARN is confusing because I think they coupled running Map
Reduce jobs with the resource manager and called it "Hadoop v2".  By
coupling the two, people look at YARN as Map Reduce V2, but it's not
really.  It's a way to running jobs on a cluster of machines (ala Mesos)
with a "application" that is the equivalent of Map Reduce V1.   The names
being given seem to be confusing to me, it makes people who have invested
in Hadoop (Map Reduce V1) be very interested in YARN because it's called
"Hadoop V2".  While Mesos is seen as the "Other"


Just for my sake I summarized a TL;DR form so if someone wants to correct
my understanding they can

Mesos = Tool to manage resources

YARN = Tool to manage resources it's also called Hadoopv2

Map Reduce V1 = Job trackers/Task Trackers it's what we know. It can run on
Hadoop clusters, and Mesos.  It's also called Hadoopv1

Map Reduce V2 =  Application that can run on YARN that mimics Map Reduce V1
on a YARN Cluster. This + YARN has been called Hadoopv2.


















On Sun, Jul 27, 2014 at 4:10 AM, Maxime Brugidou <maxime.brugidou@gmail.com>
wrote:

> When I said that running yarn over mesos did not make sense I meant that
> running a resource manager in a resource manager was very sub-optimal. You
> will eventually do static allocation of resources for the Yarn framework in
> Mesos or have complex logic to determine how much resource should be given
> to yarn. You will also have the same burden of managing 2 different
> clusters instead of one, even if yarn is sort of hidden as mesos framework.
>
> However yes I believe its easier to run yarn on mesos than to run mrv2 on
> top of mesos. The solution I was discussing was obviously "ideal" and I
> looked at the MRAppMaster since and it discouraged me :)
>  On Jul 27, 2014 12:41 AM, "Rick Richardson" <rick.richardson@gmail.com>
> wrote:
>
>> FWIW I also think the fastest approach here is is porting Yarn onto
>> Mesos.
>>
>> In a perfect world, writing an implementation layer for the Yarn
>> Interface on Mesos would certainly be the optimal approach, but looking at
>> the MRv2 code, it is very very coupled to many Yarn modules.
>>
>> If someone wanted to take on the project of making a generic resource
>> scheduler Interface for MRv2, that works be amazing :)
>> On Jul 26, 2014 6:19 PM, "Jie Yu" <yujie.jay@gmail.com> wrote:
>>
>>> I am interested in investigating the idea of YARN on top of Mesos. One
>>> of the benefits I can think of is that we can get rid of the static
>>> resource allocation between YARN and Mesos clusters. In that way, Mesos can
>>> allocate those resources that are not used by YARN to other Mesos
>>> frameworks like Aurora, Marathon, etc, to increase the resource utilization
>>> of the entire data center. Also, we could avoid running each MRv2 job as a
>>> framework which I think might cause some maintenance complexity (e.g. for
>>> framework rate limiting, etc). Finally, YARN currently does not have a good
>>> isolation support. It only supports cpu isolation right now (using
>>> cgroups). By porting YARN on top of Mesos, we might be able to leverage the
>>> existing Mesos containerizer strategy to provide better isolation between
>>> tasks. Maxime, I am curious why do you think it does not make sense to run
>>> YARN over Mesos? Since I am not super familar with YARN, I might be missing
>>> something.
>>>
>>> I have been thinking of making ResourceManager in YARN a Mesos framework
>>> and making NodeManager a Mesos executor. The NodeManager will launch
>>> containers using primitives provided by Mesos so that we have a consistent
>>> containerizer layer. I haven't fully figured out how this could be done yet
>>> (e.g., nested containers, communication between NodeManager and
>>> ResourceManager, etc.), but I would love to explore this direction. I would
>>> like to hear about any feedback/suggestions you guys have about this
>>> direction.
>>>
>>> Thanks,
>>> - Jie
>>>
>>>
>>> On Fri, Jul 25, 2014 at 1:39 PM, Maxime Brugidou <
>>> maxime.brugidou@gmail.com> wrote:
>>>
>>>> We run both mesos and yarn in prod and it does not make sense to run
>>>> yarn over mesos.
>>>>
>>>> However it would be interesting to find a way to run MRv2 jobs on mesos
>>>> with some custom layer to swap yarn with mesos. Not sure how to start
>>>> though... MRv2 contains a yarn application master that needs to be
>>>> rewritten as a mesos framework scheduler. This is probably doable. However
>>>> with MRv2 every map reduce job would be mapped as a new framework in Mesos.
>>>> Not sure how many frameworks mesos can run and scale up to. Especially
>>>> short lived frameworks.
>>>>  On Jul 25, 2014 8:54 PM, "Tom Arnfeld" <tom@duedil.com> wrote:
>>>>
>>>>> Hey Luyi,
>>>>>
>>>>> That's correct, the Hadoop framework currently only supports Hadoop 2
>>>>> MRv1. It also doesn't have great support for the HA jobtracker available
in
>>>>> newer versions of Hadoop, but I've been working on that the past few
weeks.
>>>>>
>>>>> I'm not sure how Hadoop 2 would play with Mesos, but very interested
>>>>> to find out more. Am I correct in thinking MRv2 will only run on top
of
>>>>> YARN?
>>>>>
>>>>> I wonder if anyone else on the mailing list is running YARN on top of
>>>>> Mesos...
>>>>>
>>>>> Tom.
>>>>>
>>>>> On Friday, 25 July 2014, Luyi Wang <wangluyi1982@gmail.com> wrote:
>>>>>
>>>>>> Checked the mesos github(https://github.com/mesos/hadoop). It listed
>>>>>> support for MapReduce V1
>>>>>>
>>>>>> How about the MR V2?
>>>>>>
>>>>>> Right now we are using cloudera to manage hadoop clusters where uses
>>>>>> MRV2. We are planning to migrate all our services to mesos(still
in the
>>>>>> initial investigating stage).  Good suggestions, advice and experiences
are
>>>>>> welcomed.
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>>
>>>>>> -Luyi.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>

Mime
View raw message