mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sharma Podila <spod...@netflix.com>
Subject Re: Exposing executor container
Date Tue, 12 Aug 2014 20:48:01 GMT
You may already know this, but, this does sound similar to

http://www.mail-archive.com/user@mesos.apache.org/msg00885.html

There was a possible (and partial) solution in using soft limits for memory
for which a ticket was opened.


On Tue, Aug 12, 2014 at 1:17 PM, Thomas Petr <tpetr@hubspot.com> wrote:

> That solution would likely cause us more pain -- we'd still need to figure
> out an appropriate amount of resources to request for artifact downloads /
> extractions, our scheduler would need to be sophisticated enough to only
> accept offers from the same slave that the setup task ran on, and we'd need
> to manage some new shared artifact storage location outside of the
> containers. Is splitting workflows into multiple tasks like this a common
> pattern?
>
> I personally agree that tasks manually overriding cgroups limits is a
> little sketchy (and am curious how MESOS-1279 would affect this
> discussion), but I doubt that we'll be the last people to attempt something
> like this. In other words, we acknowledge we're going rogue by
> temporarily overriding the limits... are there other implications of
> exposing the container ID that you're worried about?
>
> Do you have any thoughts about my other idea (overriding the fetcher
> executable for a task)?
>
> Thanks,
> Tom
>
> On Tue, Aug 12, 2014 at 2:05 PM, Vinod Kone <vinodkone@gmail.com> wrote:
>
>> Thanks Thomas for the clarification.
>>
>> One solution you could consider would be separating out the setup
>> (fetch/extract) phase and running phase into separate mesos tasks. That way
>> you can give the setup task resources need for fetching/extracting and as
>> soon as it is done, you can send a TASK_FINISHED so that the resources used
>> by that task are reclaimed by Mesos. That would give you the dynamism you
>> need. Would that work in your scenario?
>>
>> Having the executor change cgroup limits behind the scenes, opaquely to
>> Mesos, seems like a recipe for problems in the future to me, since it could
>> lead to temporary over-commit of resources and affect isolation across
>> containers.
>>
>>
>>
>> On Tue, Aug 12, 2014 at 10:45 AM, Thomas Petr <tpetr@hubspot.com> wrote:
>>
>>> Hey Vinod,
>>>
>>> We're not using mesos-fetcher to download the executor -- we ensure our
>>> executor exists on the slaves beforehand (during machine provisioning, to
>>> be exact). The issue that Whitney is talking about is OOMing while fetching
>>> artifacts necessary for task execution (like the JAR for a web service).
>>>
>>> Our own executor
>>> <https://github.com/HubSpot/Singularity/tree/master/SingularityExecutor>
has
>>> some nice enhancements around S3 downloads and artifact caching that we
>>> don't necessarily want to lose if we switched back to using mesos-fetcher.
>>>
>>> Surfacing the container ID seems like a trivial change, but another
>>> alternative could be to allow frameworks to specify an alternative fetcher
>>> executable (perhaps in CommandInfo?).
>>>
>>> Thanks,
>>> Tom
>>>
>>>
>>> On Tue, Aug 12, 2014 at 1:09 PM, Vinod Kone <vinodkone@gmail.com> wrote:
>>>
>>>> Hi Whitney,
>>>>
>>>> While we could conceivably set the container id in the environment of
>>>> the executor, I would like to understand the problem you are facing.
>>>>
>>>> The fetching and extracting of the executor is done in by
>>>> mesos-fetcher, a process forked by slave and run under slave's cgroup.
>>>> AFAICT, this shouldn't cause an OOM in the executor. Does your executor do
>>>> more fetches/extracts once it is launched (e.g., for user's tasks)?
>>>>
>>>
>>>
>>
>

Mime
View raw message