mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Chen <...@mesosphere.io>
Subject Re: Running services on all slaves
Date Thu, 08 Jan 2015 08:24:33 GMT
Hi Itamar,

You can pass the amount of CPU and memory that the slave is advertising to
the master for scheduling as part of the resources slave flag. So you can
only schedule 12 cpus and leave 4 for your services if you want.

That said, there are discussions about launching multiple tasks co-located
at once (aka Pods) for a while, but it's not yet concrete how it really
looks like in Mesos yet.

Tim



On Wed, Jan 7, 2015 at 11:30 PM, Itamar Ostricher <itamar@yowza3d.com>
wrote:

> Thanks everybody for all your insights!
>
> I totally agree with the last response from Tom.
> The per-node services definitely belong to the level that provisions the
> machine and the mesos-slave service itself (in our case, pre-configured GCE
> images).
>
> So I guess the problem I wanted to solve is more general - how can I make
> sure there are resources reserved for all of the system-level stuff that
> are running outside of the mesos context?
> To be more specific, if I have a machine with 16 CPUs, it is common that
> my framework will schedule 16 heavy number-crunching processes on it.
> This can starve anything else that's running on the machine... (like the
> logging aggregation service, and the mesos-slave service itself)
> (this probably explains phenomena of lost tasks we've been observing)
> What's the best-practice solution for this situation?
>
> On Wed, Jan 7, 2015 at 2:09 AM, Tom Arnfeld <tom@duedil.com> wrote:
>
>> I completely agree with Charles, though I think I can appreciate what
>> you're trying to do here. Take the log aggregation service as an example,
>> you want that on every slave to aggregate logs, but want to avoid using yet
>> another layer of configuration management to deploy it.
>>
>> I'm of the opinion that these kind of auxiliary services which all work
>> together (the mesos-slave process included) to define what we mean by a
>> "slave" are the responsibility of whoever/whatever is provisioning the
>> mesos-slave process and possibly even the machine itself. In our case,
>> that's Chef. IMO once a slave registers with the mesos cluster it's
>> immediately ready to start doing work, and mesos will actually start
>> offering that slave immediately.
>>
>> If you continue down this path you're also going to run into a variety of
>> interesting timing issues when these services fail, or when you want to
>> upgrade them. I'd suggest taking a look at some kind of more advanced
>> process monitor to run these aux services like M/Monit instead of mesos
>> (via Marathon).
>>
>> Think of it another way, would you want something running through mesos
>> to install apt package updates once a day? That'd be super weird, so why
>> would log aggregation by any different?
>>
>> --
>>
>> Tom Arnfeld
>> Developer // DueDil
>>
>>
>> On Tue, Jan 6, 2015 at 11:57 PM, Charles Baker <cnobleb@gmail.com> wrote:
>>
>>> It seems like an 'anti-pattern' (for lack of a better term) to attempt
>>> to force locality on a bunch of dependency services launched through
>>> Marathon. I thought the whole idea of Mesos (and Marathon) was to treat the
>>> data center as one giant computer in which it fundamentally should not
>>> matter where your services are launched. Although I obviously don't know
>>> the details of the use-case and may be grossly misunderstanding what you
>>> are trying to do but to me it sounds like you are attempting to shoehorn a
>>> non-distributed application into a distributed architecture. If this is the
>>> case, you may want to revisit your implementation and try to decouple the
>>> application's requirement of node-level dependency locality. It is also a
>>> good opportunity to possibly redesign a monolithic application into a
>>> distributed one.
>>>
>>> On Tue, Jan 6, 2015 at 12:53 PM, David Greenberg <dsg123456789@gmail.com
>>> > wrote:
>>>
>>>> Tom is absolutely correct--you also need to ensure that your "special
>>>> tasks" run as a user which is assigned a role w/ a special reservation to
>>>> ensure they can always launch.
>>>>
>>>> On Tue, Jan 6, 2015 at 2:38 PM, Tom Arnfeld <tom@duedil.com> wrote:
>>>>
>>>>> I'm not sure if I'm fully aware of the use case but if you use a
>>>>> different framework (aka Marathon) to launch these services, should the
>>>>> service die and need to be re-launched (or even the slave restarts) could
>>>>> you not be in a position where another framework has consumed all resources
>>>>> on that slave and your "core" tasks cannot launch?
>>>>>
>>>>> Maybe if you're just using Marathon it might provide a sort of
>>>>> priority to decide who gets what resources first, but with multiple
>>>>> frameworks you might need to look into the slave resource reservations
and
>>>>> framework roles.
>>>>>
>>>>> FWIW We're configuring these things out of band (via Chef to be
>>>>> specific).
>>>>>
>>>>> Hope this helps!
>>>>>
>>>>> --
>>>>>
>>>>> Tom Arnfeld
>>>>> Developer // DueDil
>>>>>
>>>>> (+44) 7525940046
>>>>> 25 Christopher Street, London, EC2A 2BS
>>>>>
>>>>>
>>>>> On Tue, Jan 6, 2015 at 9:05 AM, Itamar Ostricher <itamar@yowza3d.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was wondering if the best approach to do what I want is to use
>>>>>> mesos itself, or other Linux system tools.
>>>>>>
>>>>>> There are a bunch of services that our framework assumes are running
>>>>>> on all participating slaves (e.g. logging service, data-bridge service,
>>>>>> etc.).
>>>>>> One approach to do that is in the infrastructure level, making sure
>>>>>> that slave nodes are configured correctly (e.g. with pre-configured
images,
>>>>>> or other provisioning systems).
>>>>>> Another approach would be to use mesos itself (maybe with something
>>>>>> like Marathon) to schedule these services on all slave nodes.
>>>>>>
>>>>>> The advantage of the mesos-based approach is that it becomes trivial
>>>>>> to account for the resource consumption of said services (e.g. make
sure
>>>>>> there's always at least 1 CPU dedicated to this).
>>>>>> I'm not sure how to achieve something similar with the
>>>>>> system-approach.
>>>>>>
>>>>>> Anyone has any insights on this?
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message