mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Carey <>
Subject RE: Rack awareness support for Mesos
Date Tue, 07 Jun 2016 15:38:28 GMT
Would this perhaps make sense as a mesos module which can automatically assigns labels to the
agents, rather than something in the core itself?


Aaron Carey
Production Engineer - Cloud Pipeline
Industrial Light & Magic
020 3751 9150

From: Du, Fan []
Sent: 07 June 2016 16:16
To: Jörg Schad;
Subject: Re: Rack awareness support for Mesos

On 2016/6/6 23:48, Jörg Schad wrote:
> Hi,
> thanks for your idea and design doc!
> Just a few thoughts:
> a) The scheduling part would be implemented in a framework scheduler and
> not the Mesos Core, or?

I'm not sure which level of scheduling part do you indicate,
For the "Future" section of proposal?, It's Mesos allocation logic.
And how to use rack information to implement advanced features (fault
data locality) is up to the framework scheduling part.

> b) As mentioned by James, this needs to be very flexible (and not
> necessarily based on network structure),

The proposed network topology detection is modular, to fit into Ethernet,
Infiniband, or other network implementation. And yes, user can statically
configure /etc/mesos/rack_id to manipulate the logical network topology

>afaik people are using labels
> on the agents to identify different fault domains which can then be
> interpreted by framework scheduler. Maybe it would make sense (instead
> of identifying the network structure) to come up with a common label
> naming scheme which can be understood by all/different frameworks.

I'm not convinced here why still using labels,
Based on what information to label the agents? IMO, cluster operator
still needs something like lldp to find out the network topology,
every cluster operator will need to do it by his own, and it's better
to abstract the logical inside Mesos to provide common interface to

Honestly speaking, I don't follow the argument here for the labels.
The proposal is designed to do it *automatically* to reduce maintenance

> Looking forward to your thoughts on this!
> On Mon, Jun 6, 2016 at 3:27 PM, james <
> <>> wrote:
>     Hello,
>     @Stephen::I guess Stephen is bringing up the 'security' aspect of
>     who get's access to the information, particularly cluster/cloud
>     devops, customers or interlopers....?
>     @Fan:: As a consultant, most of my customers either have  or are
>     planning hybrid installations, where some codes run on a local
>     cluster or using 'the cloud' for dynamic load requirements. I would
>     think your proposed scheme needs to be very flexible, both in
>     application to a campus or Metropolitan Area Network, if not
>     massively distributed around the globe. What about different resouce
>     types (racks of arm64, gpu centric hardware, DSPs, FPGA etc etc.
>     Hardware diversity bring many
>     benefits to the cluster/cloud capabilities.
>     This also begs the quesion of hardware management (boot/config/online)
>     of the various hardware, such as is built into coreOS. Are several
>     applications going to be supported? Standards track? Just Mesos DC/OS
>     centric?
>     TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>     in resources' you need to add timing (latency) data to encourage robust
>     and diversified use of of this data. For HPC, this could be very
>     valuable for rDMA abusive algorithms where memory constrained
>     workloads not only need the knowledge of additional nearby memory
>     resources, but
>     the approximated (based on previous data collected) latency and
>     bandwidth constraints to use those additional resources.
>     Great idea. I do like it very much.
>     hth,
>     James
>     On 06/06/2016 05:06 AM, Stephen Gran wrote:
>         Hi,
>         This looks potentially interesting.  How does it work in a
>         public cloud
>         deployment scenario?  I assume you would just have to disable this
>         feature, or not enable it?
>         Cheers,
>         On 06/06/16 10:17, Du, Fan wrote:
>             Hi, Mesos folks
>             I’ve been thinking about Mesos rack awareness support for a
>             while,
>             it’s a common interest for lots of data center applications
>             to provide
>             data locality,
>             fault tolerance and better task placement. Create MESOS-5545
>             to track
>             the story,
>             and here is the initial design doc [1] to support rack
>             awareness in Mesos.
>             Looking forward to hear any comments from end user and other
>             developers,
>             Thanks!
>             [1]:

View raw message