mesos-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian Zhang <zhang...@cn.ibm.com>
Subject Re: Review Request 51631: Tracked recovered and prepared cgroups subsystems for containers.
Date Mon, 12 Sep 2016 02:55:17 GMT


> On Sept. 6, 2016, 7:38 p.m., Qian Zhang wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp, lines 290-291
> > <https://reviews.apache.org/r/51631/diff/1/?file=1491031#file1491031line290>
> >
> >     I do not think we need this comment because I think if recover fails, the agent
will exit, so we do not have chance (or actually do not need) to do any cleanup.
> 
> haosdent huang wrote:
>     We call cleanup before return `Failure` in `__recover`, I think this comment still
correct here?

I took a look at `__recover()` again, and I see in this method, we will not call `cleanup()`
before returning `Failure`:
https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp#L268:L277


> On Sept. 6, 2016, 7:38 p.m., Qian Zhang wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp, lines 486-492
> > <https://reviews.apache.org/r/51631/diff/1/?file=1491031#file1491031line486>
> >
> >     Here we may assign pid to cgroup for a single hierarchy multiple times. For
example, in the case of CPU:
> >     ```
> >     /cgroup/cpu,cpuacct -> cpu
> >     /cgroup/cpu,cpuacct -> cpuacct
> >     ```
> >     With your logic here, we will call `cgroups::assign()` twice for the hierarchy
`/cgroup/cpu,cpuacct`.
> 
> haosdent huang wrote:
>     Because we have `break` above, so this would not happen.

Yes, you are right, thanks!


> On Sept. 6, 2016, 7:38 p.m., Qian Zhang wrote:
> > src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp, lines 408-411
> > <https://reviews.apache.org/r/51631/diff/1/?file=1491031#file1491031line408>
> >
> >     Why moving these code here? Can you please let me know what is the problem if
we still keep these code in its original location?
> 
> haosdent huang wrote:
>     Suppose we failed at 
>     ```
>           if (containerConfig.has_user()) {
>           Try<Nothing> chown = os::chown(
>               containerConfig.user(),
>               path,
>               false);
>     
>           if (chown.isError()) {
>             return Failure(
>                 "Failed to chown the cgroup at "
>                 "'" + path + "': " + chown.error());
>           }
>     ```
>     
>     but we have 
>     ```
>         Try<Nothing> create = cgroups::create(
>             hierarchy,
>             infos[containerId]->cgroup);
>     ```
>     before.
>     
>     Then the cgroup would not be destroyed if we don't `infos[containerId]->subsystems.insert(subsystem->name());`.

Got it! Then what if we fail right after `cgroups::create()` but before `infos[containerId]->subsystems.insert();`,
in this case, the cgroup will not be destroyed too. So I think we may need to do `infos[containerId]->subsystems.insert();`
right after the new Info structure is created, like below:
```
infos[containerId] = Owned<Info>(new Info(
      containerId,
      path::join(flags.cgroups_root, containerId.value())));

foreachvalue (const Owned<Subsystem>& subsystem, subsystems) {
  infos[containerId]->subsystems.insert(subsystem->name());
}
```


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51631/#review147806
-----------------------------------------------------------


On Sept. 12, 2016, 10:49 a.m., haosdent huang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51631/
> -----------------------------------------------------------
> 
> (Updated Sept. 12, 2016, 10:49 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jie Yu, and Qian Zhang.
> 
> 
> Bugs: MESOS-6063
>     https://issues.apache.org/jira/browse/MESOS-6063
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Recover newly added cgroups subsystems on existing containers would
> fail, and continue to perform the `update` and other operations of
> the newly added subsystems on them don't make sense. This patch add
> the tracking for the recovered or prepared cgroups subsystems of a
> container and skip performing unnecessary subsystem operations on the
> container if the subsystem is never recovered or prepared.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.hpp 38d1428f5425566502747d2a8394e246e0b3fd9e

>   src/slave/containerizer/mesos/isolators/cgroups/cgroups.cpp 8b6dfde366caf82d30afb891c8f1337ceed12157

> 
> Diff: https://reviews.apache.org/r/51631/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> haosdent huang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message