mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Peach (JIRA)" <>
Subject [jira] [Commented] (MESOS-9319) Create all container devices at isolation time.
Date Tue, 16 Oct 2018 23:48:00 GMT


James Peach commented on MESOS-9319:

Prototype code looks promising. Currently, /dev is a tmpfs, but in this proposal it would
be a bind mount to a real filesystem. I'm binding it in read-only to prevent disk quota escapes,
which seems to work OK.

> Create all container devices at isolation time.
> -----------------------------------------------
>                 Key: MESOS-9319
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: James Peach
>            Assignee: James Peach
>            Priority: Major
> When using a custom user namespace isolator, the task fails at launch because opening
devices fails with a EPERM error. This problem is described in [this system issue|]
and [this lxd|] issue.
> The problem arises in the Mesos containerizer due to the order of operations:
> # Clone the containerizer with {{CLONE_NEWNS}}
> # Mount a tmpfs for the devices
> # mknod for the various device nodes
> Referring back to the lxc issue, because we do (1) before (2), the tmpfs on {{/dev}}
is marked {{SB_I_NODEV}}. Due to the new 4.18 behavior, the mkdir in (3) now succeeds (see
commit [55956b59df33|]).
Previously it would fail and we would fall back to bind mounting the device. However, even
though we created the device, we can't actually open it due to the {{SB_I_NODEV}} flag on
the tmpfs mount. It appears that the purpose of allowing mknod is to that containers can create
overlayfs whiteouts.
> One approach to deal with this in the Mesos containerizer is to complete the device node
cleanup that was begun in with the linux/devices isolator. This approach involves moving all
the responsibility for creating devices back to the isolators. Then, at containerization time,
we simply bind-mount the whole of /dev from the per-container staging area. Since the isolators
create the devices in the host namespace and on the Mesos work directory, none of the conditions
that trigger the failure would be invoked.
> The failure we observed with our tasks was a failure to open {{/dev/null}}, when redirecting
it as standard input to a child process.

This message was sent by Atlassian JIRA

View raw message