mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Clemmer <clemmer.alexan...@gmail.com>
Subject Re: Review Request 55313: Windows: Fixed the unkillable task bug, lit up executor tests.
Date Wed, 18 Jan 2017 18:04:01 GMT


> On Jan. 18, 2017, 2:08 a.m., Joseph Wu wrote:
> > src/slave/containerizer/mesos/launcher.cpp, lines 117-133
> > <https://reviews.apache.org/r/55313/diff/1/?file=1599654#file1599654line117>
> >
> >     I suspect that we'll use this lambda for other subprocesses in future, so let's
move it into `include/process/subprocess_base.hpp` and `src/subprocess.cpp`:
> >     
> >     ```
> >     struct ParentHook
> >     {
> >       ...
> >     #ifdef __WINDOWS__
> >       static ParentHook CREATE_JOB();
> >     #endif // __WINDOWS__
> >     };
> >     ```

Ok, we can do this. I do question whether this really belongs as part of the `Subprocess`
API, but I do not will not block that change. :)


- Alex


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55313/#review162016
-----------------------------------------------------------


On Jan. 8, 2017, 6:30 a.m., Alex Clemmer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/55313/
> -----------------------------------------------------------
> 
> (Updated Jan. 8, 2017, 6:30 a.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer, Daniel Pravat, and Joseph Wu.
> 
> 
> Bugs: MESOS-6698, MESOS-6839 and MESOS-6870
>     https://issues.apache.org/jira/browse/MESOS-6698
>     https://issues.apache.org/jira/browse/MESOS-6839
>     https://issues.apache.org/jira/browse/MESOS-6870
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> MESOS-6839 tracks a bug that causes the current implementation of the
> default executor to be unable to delete any processes associated with a
> task. To understand why requires some knowledge of the differences
> between the process model of Windows and Unix.
> 
> In Unix, there is a robust notion of a process tree, with a well-defined
> notion of process groups, sessions, signal delivery on the tree, and so
> on. Windows lacks a robust notion of a process hierarchy, and therefore
> largely has no equivalents to these constructs (including, notably,
> signal semantics).
> 
> One of the problems this mismatch causes Mesos is that it complicates
> the problem of killing a task, which is at its core a group of
> processes. On Windows, the easiest way to make a process and all its
> descendents easily killable is to track these processes in a Job Object,
> which is a Windows kernel construct similar in principle to Linux's
> control groups (though with different ideas of process namespacing).
> 
> There is some subtlety in making sure _all_ processes associated with a
> task are captured inside a Job Object. The most important consideration
> is that we need to make sure to add any process to the Job Object before
> it has a chance to create any child processes; if we fail to do this,
> the children will not be captured in the Job Object.
> 
> The solution to this is fairly simple on Windows. The process creation
> API allows users to trivially create a process in a suspended state, so
> that the Windows kernel scheduler does not schedule the process to run
> until the user explicitly resumes the main thread. This allows us to
> create the process and add it to a Job Object before it has a chance to
> create children, and then start the process.
> 
> This commit will accomplish this by changing `PosixLauncher::fork` to
> use the Subprocess parent hooks API, which implements exactly this
> semantics. Essentially, the launcher will launch the containerizer
> process, which will inspect the TaskInfo or the environment for a task
> to launch, and then launch it. Using the parent hooks API, Subprocess
> will create the containerizer process on Windows in a suspended state,
> and then the parent hook supplied by the launcher will add that process
> to a Job Object before it has a chance to run. Finally, Subprocess will
> mark the process as runnable, and return.
> 
> This commit resolves MESOS-6839. We also light up the executor tests, so
> it also resolves MESOS-6870 and MESOS-6839.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/launcher.cpp a6a8c01cb39f35f8174fcb5af0ef18de2da5ee78

>   src/tests/command_executor_tests.cpp f4e7044b82e8e81d6430551dc0b2a288db10bc3c 
>   src/tests/default_executor_tests.cpp 340e8c8b36a6a3cc6e5bae021e19dfa7a54852c3 
> 
> Diff: https://reviews.apache.org/r/55313/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alex Clemmer
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message