mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Arnfeld <...@duedil.com>
Subject Re: Force a slave to garbage collect framework/executors
Date Fri, 01 Aug 2014 08:48:09 GMT
Crystal clear, thanks Ben!


On 1 August 2014 01:36, Benjamin Mahler <benjamin.mahler@gmail.com> wrote:

> Everything is scheduled for the garbage collection delay (1 week by
> default) from when it was last modified, but as the disk fills up we'll
> start pruning the older directories ahead of schedule.
>
> This means that things should be removed in the same order that they were
> scheduled.
>
> You can think of this as follows, everything gets scheduled for 1 week in
> the future, but we'll "speed up" the existing schedule when we need to make
> room. Make sense?
>
>
> On Thu, Jul 31, 2014 at 4:18 PM, Tom Arnfeld <tom@duedil.com> wrote:
>
>> Yeah, specifically the docker issue was related to volumes not being
>> removed with `docker rm` but that's a separate issue.
>>
>> So right now mesos won't remove older work directories to make room for
>> new ones (old ones that have already been scheduled for removal in a few
>> days time)? This means when the disk gets quite full, newer work
>> directories will be removed much faster than older ones. Is that correct?
>>
>>
>>
>> On 31 July 2014 23:56, Benjamin Mahler <benjamin.mahler@gmail.com> wrote:
>>
>>> Apologies for the lack of documentation, in the default setup, the slave
>>> will schedule the work directories for garbage collection when:
>>>
>>> (1) Executors terminate.
>>> (2) The slave recovers and discovers work directories for terminal
>>> executors.
>>>
>>> Sounds like the docker integration code you're using has a bug in this
>>> respect, either by not scheduling docker directories for garbage collection
>>> during (1) and/or (2).
>>>
>>>
>>> On Thu, Jul 31, 2014 at 3:40 PM, Tom Arnfeld <tom@duedil.com> wrote:
>>>
>>>> I don't have them to hand now, but I recall it saying something in the
>>>> high 90's and 0ns for the max allowed age. I actually found the root cause
>>>> of the probably, docker related and out of mesos's control... though i'm
>>>> still curious about the expected behaviour of the GC process. It doesn't
>>>> seem to be well documented anywhere.
>>>>
>>>> Tom.
>>>>
>>>>
>>>> On 31 July 2014 23:33, Benjamin Mahler <benjamin.mahler@gmail.com>
>>>> wrote:
>>>>
>>>>> What do the slave logs say?
>>>>>
>>>>> E.g.
>>>>>
>>>>> I0731 22:22:17.851347 23525 slave.cpp:2879] Current usage 7.84%. Max
>>>>> allowed age: 5.751197441470081days
>>>>>
>>>>>
>>>>> On Wed, Jul 30, 2014 at 8:55 AM, Tom Arnfeld <tom@duedil.com> wrote:
>>>>>
>>>>>> I'm not sure if this is something already supported by mesos, and
if
>>>>>> so it'd be great if someone could point me in the right direction.
>>>>>>
>>>>>> Is there a way of asking a slave to garbage collect old executors
>>>>>> manually?
>>>>>>
>>>>>> Maybe i'm misunderstanding things, but as each executor does (insert
>>>>>> knowledge gap) mesos works out how long it is able to keep the sandbox
for
>>>>>> and schedules it for garbage collection appropriately, also taking
into
>>>>>> account the command line
>>>>>>
>>>>>> The disk on one of my slaves is getting quite full (98%) and i'm
>>>>>> curious how mesos is going to behave in this situation. Should it
start
>>>>>> clearing things up, given a task could launch that needs to use an
amount
>>>>>> of disk space, but that disk is being eaten up by old executor sandboxes.
>>>>>>
>>>>>> It may be worth noting i'm not specifying --gc_delay on any slave
>>>>>> right now, perhaps I should be?
>>>>>>
>>>>>> Any input would be much appreciated.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Tom.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message