mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pierre Cheynier (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-7007) filesystem/shared and --default_container_info broken since 1.1
Date Tue, 07 Feb 2017 13:22:41 GMT

    [ https://issues.apache.org/jira/browse/MESOS-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15855968#comment-15855968
] 

Pierre Cheynier commented on MESOS-7007:
----------------------------------------

Hi [~jieyu], [~gilbert],
I had a discussion on Friday with [~jieyu] about that issue.
Since, I did tests on 1.1.0 :
* {{--launcher=linux}} doesn't change anything. As seen with Jie Yu, I was already on this
launcher, by default I guess.
* by removing filesystem/shared, /tmp content is no more trashed on container creation/deletion
BUT now the /tmp volume feature does not work anymore: 
  ** the tmp in the sandbox is {{root:root}} and {{0777}} and it is a pure bind mount, not
something isolated - meaning if I erase here, it will erase on /tmp as well-
  ** I run into this issue: https://issues.apache.org/jira/browse/MESOS-6563, looking at the
mounts visible from root: 
{noformat}
# There is only 1 task, so theoretically 1 mount
$ mesos-ps --master=127.0.0.1:5050
USER    FRAMEWORK    TASK      SLAVE              MEM                TIME              CPU
(allocated)
mara... marathon     visibi... mesos-cluster-c... 13.7 MB/42.0 MB    00:00:01.490000   0.2
           
# But in fact, ... no !
$ mount | grep "mesos/slaves" | wc -l
56
# 56 is probably the number of container I launched for my CI tests
$ mount | grep "mesos/slaves" | head -5
/dev/sda3 on /var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_simplehttp.dcde69c5-ed32-11e6-b388-02427970a3a5/runs/45277613-6129-4eb3-b8d0-acc0c2fe8605/tmp
type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on /var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_simplehttp.dcde69c5-ed32-11e6-b388-02427970a3a5/runs/45277613-6129-4eb3-b8d0-acc0c2fe8605/tmp
type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on /var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_security.f6152faa-ed32-11e6-b388-02427970a3a5/runs/f74453b6-aa39-456f-a4a1-bd953b870d38/tmp
type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on /var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_simplehttp.dcde69c5-ed32-11e6-b388-02427970a3a5/runs/45277613-6129-4eb3-b8d0-acc0c2fe8605/tmp
type ext4 (rw,relatime,seclabel,data=ordered)
/dev/sda3 on /var/opt/mesos/slaves/e02761a5-308e-4797-b43b-b56c3da66616-S0/frameworks/e02761a5-308e-4797-b43b-b56c3da66616-0000/executors/group_security.f6152faa-ed32-11e6-b388-02427970a3a5/runs/f74453b6-aa39-456f-a4a1-bd953b870d38/tmp
type ext4 (rw,relatime,seclabel,data=ordered)
{noformat}

What's the plan ? 

> filesystem/shared and --default_container_info broken since 1.1
> ---------------------------------------------------------------
>
>                 Key: MESOS-7007
>                 URL: https://issues.apache.org/jira/browse/MESOS-7007
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.1.0
>            Reporter: Pierre Cheynier
>
> I face this issue, that prevent me to upgrade to 1.1.0 (and the change was consequently
introduced in this version):
> I'm using default_container_info to mount a /tmp volume in the container's mount namespace
from its current sandbox, meaning that each container have a dedicated /tmp, thanks to the
{{filesystem/shared}} isolator.
> I noticed through our automation pipeline that integration tests were failing and found
that this is because /tmp (the one from the host!) contents is trashed each time a container
is created.
> Here is my setup: 
> * {{--isolation='cgroups/cpu,cgroups/mem,namespaces/pid,*disk/du,filesystem/shared,filesystem/linux*,docker/runtime'}}
> * {{--default_container_info='\{"type":"MESOS","volumes":\[\{"host_path":"tmp","container_path":"/tmp","mode":"RW"\}\]\}'}}
> I discovered this issue in the early days of 1.1 (end of Nov, spoke with someone on Slack),
but had unfortunately no time to dig into the symptoms a bit more.
> I found nothing interesting even using GLOGv=3.
> Maybe it's a bad usage of isolators that trigger this issue ? If it's the case, then
at least a documentation update should be done.
> Let me know if more information is needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message