mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Yu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-5239) Persistent volume DockerContainerizer support assumes proper mount propagation setup on the host.
Date Sat, 04 Jun 2016 18:02:59 GMT

     [ https://issues.apache.org/jira/browse/MESOS-5239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jie Yu updated MESOS-5239:
--------------------------
    Fix Version/s: 0.28.2

> Persistent volume DockerContainerizer support assumes proper mount propagation setup
on the host.
> -------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-5239
>                 URL: https://issues.apache.org/jira/browse/MESOS-5239
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.28.0, 0.28.1
>            Reporter: Jie Yu
>            Assignee: Jie Yu
>              Labels: mesosphere
>             Fix For: 1.0.0, 0.28.2
>
>
> We recently added persistent volume support in DockerContainerizer (MESOS-3413). To understand
the problem, we first need to understand how persistent volumes are supported in DockerContainerizer.
> To support persistent volumes in DockerContainerizer, we bind mount persistent volumes
under a container's sandbox ('container_path' has to be relative for persistent volumes).
When the Docker container is launched, since we always add a volume (-v) for the sandbox,
the persistent volumes will be bind mounted into the container as well (since Docker does
a 'rbind').
> The assumption that the above works is that the Docker daemon should see those persistent
volume mounts that Mesos mounts on the host mount table. It's not a problem if Docker daemon
itself is using the host mount namespace. However, on systemd enabled systems, Docker daemon
is running in a separate mount namespace and all mounts in that mount namespace will be marked
as slave mounts due to this [patch|https://github.com/docker/docker/commit/eb76cb2301fc883941bc4ca2d9ebc3a486ab8e0a].
> So what that means is that: in order for it to work, the parent mount of agent's work_dir
should be a shared mount when docker daemon starts. This is typically true on CentOS7, CoreOS
as all mounts are shared mounts by default.
> However, this causes an issue with the 'filesystem/linux' isolator. To understand why,
first I need to show you a typical problem when dealing with shared mounts. Let me explain
that using the following commands on a CentOS7 machine:
> {noformat}
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> [root@core-dev run]# mkdir /run/netns
> [root@core-dev run]# mount --bind /run/netns /run/netns
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> [root@core-dev run]# ip netns add test
> [root@core-dev run]# cat /proc/self/mountinfo
> 24 60 0:19 / /run rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 121 24 0:19 /netns /run/netns rw,nosuid,nodev shared:22 - tmpfs tmpfs rw,seclabel,mode=755
> 162 121 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> 163 24 0:3 / /run/netns/test rw,nosuid,nodev,noexec,relatime shared:5 - proc proc rw
> {noformat}
> As you can see above, there're two entries (/run/netns/test) in the mount table (unexpected).
This will confuse some systems sometimes. The reason is because when we create a self bind
mount (/run/netns -> /run/netns), the mount will be put into the same shared mount peer
group (shared:22) as its parent (/run). Then, when you create another mount underneath that
(/run/netns/test), that mount operation will be propagated to all mounts in the same peer
group (shared:22), resulting an unexpected additional mount being created.
> The reason we need to do a self bind mount in Mesos is that sometimes, we need to make
sure some mounts are shared so that it does not get copied when a new mount namespace is created.
However, on some systems, mounts are private by default (e.g., Ubuntu 14.04). In those cases,
since we cannot change the system mounts, we have to do a self bind mount so that we can set
mount propagation to shared. For instance, in filesytem/linux isolator, we do a self bind
mount on agent's work_dir.
> To avoid the self bind mount pitfall mentioned above, in filesystem/linux isolator, after
we created the mount, we do a make-slave + make-shared so that the mount is its own shared
mount peer group. In that way, any mounts underneath it will not be propagated back.
> However, that operation will break the assumption that the persistent volume DockerContainerizer
support makes. As a result, we're seeing problem with persistent volumes in DockerContainerizer
when filesystem/linux isolator is turned on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message