mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone" <vinodk...@gmail.com>
Subject Re: Review Request: Slave GC based on disk usage
Date Thu, 23 Aug 2012 19:39:06 GMT


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/constants.hpp, line 30
> > <https://reviews.apache.org/r/6704/diff/2/?file=143654#file143654line30>
> >
> >     so does 1 week seem correct?
> >     
> >     using the math you have below:
> >     disk 50%  
> >     max_age 3.5 days
> >     
> >     disk 75%
> >     max_age 1.75 days
> >     
> >     do we actually see the disks fill up this rapidly? perhaps initially we should
be less agressive on this age, like set 2x or 3x on the current gc_timeout_hours
> >     
> >     of course, ben and you know much more about the typical disk usage patterns
than I do, so let me know your opinion please :)

I agree. The current algorithm for calculating age is completely arbitrary. Not based on historical
data. The (TODO) idea is for the age function to be expressed on command line. That makes
it easy for us to tune it on the go (with a puppet run + master roll)


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/constants.hpp, line 31
> > <https://reviews.apache.org/r/6704/diff/2/?file=143654#file143654line31>
> >
> >     seems a bit frequent no? what does statvfs do under the hood?

changed it to 60 seconds. btw, statvfs is pretty fast (this is what 'df' uses). unlike 'du'
which calculates disk space usage "right now", df uses cached information from something called
a primary superblock of the filesystem. for more info: http://linuxshellaccount.blogspot.com/2008/12/why-du-and-df-display-different-values.html


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/gc.hpp, line 23
> > <https://reviews.apache.org/r/6704/diff/2/?file=143656#file143656line23>
> >
> >     kill these new imports, and put them in the cpp file instead?

done


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/gc.cpp, line 127
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line127>
> >
> >     this will just print true or false, would be nice to have the actual error here,
so how about we LOG in os::rmdir() when anything goes wrong?
> >     
> >

the exit status shouldn't be there. i used it for debugging. removed it.

i think most (all?) our stout functions should be wrapped with Try/Result/Option. this would
make it easier to grab their exit status at the call site. on the flip side, this makes the
functions a bit cumbersome (call, check, get) to use. i will punt on this for now, may be
another review (Technical debt :))


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/gc.cpp, line 146
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line146>
> >
> >     move the log line within the if? looks really noisy

done


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1591
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1591>
> >
> >     I'm not getting the math here:
> >     
> >     current gc timeout hours: 1 week
> >     suppose max age: 1 day
> >     
> >     then
> >     
> >     1 week - 1 day = 6 days
> >     
> >     so we would prune everything scheduled for deletion within the next 6 days...?

yes that's correct.  this is because, if a file's deletion time is within next 6 days then
its atleast 1 day old! this stems from the fact that a directory is always scheduled for deletion
1 week into future from the time it becomes a candidate for deletion.

i know this might be hard to wrap one's head around. i will add a comment explaining this.
but i'm to open to suggestions for making the math easier to understand.


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1588
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1588>
> >
> >     can you nicely format this number, like %.2f

its a bit involved to do this in C++ (set width, set precision, unset precision, unset width).

since we haven't done that anywhere else in the code base, i'm gonna punt on this.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review10603
-----------------------------------------------------------


On Aug. 21, 2012, 11:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp ab83972 
>   src/slave/flags.hpp 0c7917f 
>   src/slave/gc.hpp 6704742 
>   src/slave/gc.cpp 9c01024 
>   src/slave/slave.hpp 10c537b 
>   src/slave/slave.cpp 4efd41e 
>   src/tests/gc_tests.cpp 2f0bdde 
>   third_party/libprocess/include/stout/os.hpp b1eceb3 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message