aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Erb <s...@apache.org>
Subject Re: Review Request 49512: [FEEDBACK] Add thermos option to monitor whole docker storage disk usage
Date Tue, 19 Jul 2016 19:21:40 GMT


> On July 4, 2016, 9:55 a.m., Stephan Erb wrote:
> > Have you considered querying Mesos for the disk usage of the task? That would be
in line with our desire to also leave the isolation up to Mesos https://issues.apache.org/jira/browse/AURORA-1033
> 
> Martin Hrabovcin wrote:
>     I wasn't aware of this ticket and I looked at mesos implementation. I think this
would be still useful for aurora/docker users as mesos currently does only XFS or du (similar
to aurora) isolation. Docker containerizer has no support for storage quota.
> 
> Stephan Erb wrote:
>     Have you had a chance to look at the universal containerizer in Mesos? Would it work
for your current usecase? Joshua is currently working on support for the universal containerizer
in Mesos. Once this has landed, the existing `du` isolation would also work for Docker containers.

>     
>     I am not against your change in general (it looks good!). I am just trying to figure
out if we would actually be needed.
> 
> George Sirois wrote:
>     If we do go ahead with this change in its current form, I think there are a couple
of things to consider:
>     
>     * Performance/Overhead - the twitter.commons du function is now having to walk a
lot more files than it was before. We've seen high CPU usage by thermos in cases where we
had many files in the sandbox (granted, we had a fairly pathological case with many small
tasks on a single host, each with many sandbox files, but it was still a legitimate issue)
>     * Volume mounts - If an operator/user is mounting volumes into the container, these
volumes will now count against the disk quota. I can imagine cases where you may have a large
shared volume mounted into your containers that you wouldn't want to count against the limit.
> 
> Martin Hrabovcin wrote:
>     Thanks for the feedback. 
>     
>     - Performacne - I think that mesos disk usage version will be more performant as
it uses native `du` command. It also supports excluding volumes. It wouldn't be hard to use
similar approach in thermos but I am worried about introducing new dependency on `du` binary
inside docker container.
>     - Volume mounts - Is there a standard way in aurora to schedule tasks with docker
volumes? 
>     
>     I'll close this review for now and see if I can come up with better solution.

Thanks Martin. Please let us know if you need any help or assistance with that.

Regarding the volume mounts in docker: Currently there are two options. Either via `-global_container_mounts`
or via the generic docker parameters that you can set per job (if enabled on the scheduler
side).


- Stephan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49512/#review140619
-----------------------------------------------------------


On July 1, 2016, 4:51 p.m., Martin Hrabovcin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49512/
> -----------------------------------------------------------
> 
> (Updated July 1, 2016, 4:51 p.m.)
> 
> 
> Review request for Aurora.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> I'd like to get feedback on this patch.
> 
> Thermos currently monitors only sandbox directory but when running with Docker containerizer
process can write data to whole container filesystem. This patch gives option to monitor whole
docker container filesystem.
> 
> Behavior is unchanged if new flag isn't provided.
> 
> 
> Diffs
> -----
> 
>   examples/jobs/docker/hello_docker.aurora 47adb3932323120e62649f43353719b76c48a963 
>   src/main/python/apache/aurora/executor/bin/thermos_executor_main.py 203fc47d74840889a1192dc867fef5584b704685

>   src/main/python/apache/aurora/executor/common/resource_manager.py b7dc40d8973ec2e5998ab4f6ff988051a70bb1ab

>   src/main/python/apache/thermos/monitoring/disk.py 52c5d74fd70b5942ea3ef5101ba3f27bfc98fc21

>   src/main/python/apache/thermos/monitoring/resource.py 53d0ff1a71c27f053c59acca556c35d1e5ac91f0

>   src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
fe74bd1d36666ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/test_disk.py e57467c61d94d67e6cf6b766c062588235b4b235

> 
> Diff: https://reviews.apache.org/r/49512/diff/
> 
> 
> Testing
> -------
> 
> e2e testing
> unit tests
> manual testing in Vagrant dev box and on custom dev cluster with option enabled.
> 
> 
> Thanks,
> 
> Martin Hrabovcin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message