aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zameer Manji <zma...@apache.org>
Subject Re: Review Request 53418: Send SIGTERM to daemonized processes on shutdown.
Date Thu, 03 Nov 2016 22:34:41 GMT


> On Nov. 2, 2016, 11:19 p.m., Evan Klitzke wrote:
> > src/main/python/apache/thermos/common/process_util.py, line 53
> > <https://reviews.apache.org/r/53418/diff/1/?file=1552758#file1552758line53>
> >
> >     typo, should be "processes"

Done.


> On Nov. 2, 2016, 11:19 p.m., Evan Klitzke wrote:
> > src/main/python/apache/thermos/common/process_util.py, line 64
> > <https://reviews.apache.org/r/53418/diff/1/?file=1552758#file1552758line64>
> >
> >     what if this fails? e.g. libc cannot be found for some reason

Good question.

It seems `find_library` returns `None` if it can't find the .so so I have adjusted the code
to check for `None` and throw an exception.


> On Nov. 2, 2016, 11:19 p.m., Evan Klitzke wrote:
> > src/main/python/apache/thermos/core/helper.py, lines 235-239
> > <https://reviews.apache.org/r/53418/diff/1/?file=1552759#file1552759line235>
> >
> >     this is fine how you have it, or you could do:
> >     
> >     ```
> >     all_pids = set(c.pid for c in psutil.Process().children())
> >     orphaned_pids = all_pids - coordinator_pids
> >     ```
> >     
> >     same applies to getting the coordinator_pids: if you like, you can use a generator
comprehension for brevity.
> >     
> >     no strong feeling either way

Done.


> On Nov. 2, 2016, 11:19 p.m., Evan Klitzke wrote:
> > src/main/python/apache/thermos/core/helper.py, line 241
> > <https://reviews.apache.org/r/53418/diff/1/?file=1552759#file1552759line241>
> >
> >     nit: this should be
> >     
> >     ```
> >     log.debug("Orphaned pids detected: %s", orphaned_pids)
> >     ```

Done.


> On Nov. 2, 2016, 11:19 p.m., Evan Klitzke wrote:
> > src/main/python/apache/thermos/core/helper.py, line 243
> > <https://reviews.apache.org/r/53418/diff/1/?file=1552759#file1552759line243>
> >
> >     same, do not use string interpolation with log messages, use varargs

Done.


> On Nov. 2, 2016, 11:19 p.m., Evan Klitzke wrote:
> > src/main/python/apache/thermos/core/process.py, lines 315-317
> > <https://reviews.apache.org/r/53418/diff/1/?file=1552760#file1552760line315>
> >
> >     Do you need to check if this is available? This feature is only available since
Linux 3.4. Not sure what your policy is on this.
> 
> Joshua Cohen wrote:
>     I actually looked into this when Zameer sent out the email proposing this. The 3.4
kernel was released in May of 2012. I think it's pretty safe to rely on this being available,
though we should definitely make a note of this in the RELEASE-NOTES.md.

I will update RELEASE-NOTES.md informing of this new behaviour and requirement.


> On Nov. 2, 2016, 11:19 p.m., Evan Klitzke wrote:
> > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh, line 79
> > <https://reviews.apache.org/r/53418/diff/1/?file=1552763#file1552763line79>
> >
> >     i would do:
> >     
> >     ```
> >     for i in {1..10}; do
> >     ```
> >     
> >     which is available in bash

Done, good catch.


> On Nov. 2, 2016, 11:19 p.m., Evan Klitzke wrote:
> > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh, line 87
> > <https://reviews.apache.org/r/53418/diff/1/?file=1552763#file1552763line87>
> >
> >     no quotes necessary here, either for `$_success` or `1`. at least the quotes
around 1 feel weird to me, especially given that you're using `-ne` instead of `!=`

Done.


> On Nov. 2, 2016, 11:19 p.m., Evan Klitzke wrote:
> > src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh, line 412
> > <https://reviews.apache.org/r/53418/diff/1/?file=1552763#file1552763line412>
> >
> >     fyi you're not quoting any variables throughout here, but i assume that's safe
to do from the test context

Yes, I think it's fine.


- Zameer


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53418/#review154677
-----------------------------------------------------------


On Nov. 2, 2016, 7:53 p.m., Zameer Manji wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53418/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2016, 7:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Stephan Erb.
> 
> 
> Bugs: AURORA-1808
>     https://issues.apache.org/jira/browse/AURORA-1808
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> # Problem
> 
> Processes can deamonize and escape the supervision of a coordinator. Using the Docker
Containerizer or the Mesos Containerizer with pid isolation means that the processes will
be come reparented to the sh process that launches the executor. For example:
> 
> ```
> root@aurora:/# ps xf
>   PID TTY      STAT   TIME COMMAND
>    48 ?        Ss     0:00 /bin/bash
>    86 ?        R+     0:00  _ ps xf
>     1 ?        Ss     0:00 /bin/sh -c ${MESOS_SANDBOX=.}/thermos_executor.pex --announcer-ensemble
localhost:2181 --announcer-zookeeper-auth-config /home/vagrant/aurora/examples/va
>     5 ?        Sl     0:02 python2.7 /mnt/mesos/sandbox/thermos_executor.pex --announcer-ensemble
localhost:2181 --announcer-zookeeper-auth-config /home/vagrant/aurora/examples/vag
>    23 ?        S      0:00  _ /usr/local/bin/python2.7 /mnt/mesos/sandbox/thermos_runner.pex
--task_id=www-data-devel-hello_docker_engine-0-bde5cdc7-8685-46fd-9078-4a86bd5be152 --
>    29 ?        Ss     0:00      _ /usr/local/bin/python2.7 /mnt/mesos/sandbox/thermos_runner.pex
--task_id=www-data-devel-hello_docker_engine-0-bde5cdc7-8685-46fd-9078-4a86bd5be15
>    32 ?        S      0:00      |   _ /bin/bash -c      while true; do       echo hello
world       sleep 10     done
>    81 ?        S      0:00      |       _ sleep 10
>    31 ?        Ss     0:00      _ /usr/local/bin/python2.7 /mnt/mesos/sandbox/thermos_runner.pex
--task_id=www-data-devel-hello_docker_engine-0-bde5cdc7-8685-46fd-9078-4a86bd5be15
>    33 ?        S      0:00          _ /bin/bash -c      while true; do       echo hello
world       sleep 10     done
>    82 ?        S      0:00              _ sleep 10
>    47 ?        S      0:00 python ./daemon.py
> ```
> 
> # Solution
> 
> Ensure processes that escape the supervision of the coordinator reparent to the runner
who can send signals to them on task tear down. We do this by using the `PR_SET_CHILD_SUBREAPER`
flag of `prctl(2)`.
> 
> After this change the process tree looks like:
> ```
> root@aurora:/# ps xf
>   PID TTY      STAT   TIME COMMAND
>    66 ?        Ss     0:00 /bin/bash
>    70 ?        R+     0:00  _ ps xf
>     1 ?        Ss     0:00 /bin/sh -c ${MESOS_SANDBOX=.}/thermos_executor.pex --announcer-ensemble
localhost:2181 --announcer-zookeeper-auth-config /home/vagrant/aurora/examples/va
>     5 ?        Sl     0:02 python2.7 /mnt/mesos/sandbox/thermos_executor.pex --announcer-ensemble
localhost:2181 --announcer-zookeeper-auth-config /home/vagrant/aurora/examples/vag
>    23 ?        S      0:00  _ /usr/local/bin/python2.7 /mnt/mesos/sandbox/thermos_runner.pex
--task_id=www-data-devel-hello_docker_engine-0-721406db-00f5-4c0c-915e-1dbc5568b849 --
>    33 ?        Ss     0:00      _ /usr/local/bin/python2.7 /mnt/mesos/sandbox/thermos_runner.pex
--task_id=www-data-devel-hello_docker_engine-0-721406db-00f5-4c0c-915e-1dbc5568b84
>    40 ?        S      0:00      |   _ /bin/bash -c      while true; do       echo hello
world       sleep 10     done
>    63 ?        S      0:00      |       _ sleep 10
>    36 ?        Ss     0:00      _ /usr/local/bin/python2.7 /mnt/mesos/sandbox/thermos_runner.pex
--task_id=www-data-devel-hello_docker_engine-0-721406db-00f5-4c0c-915e-1dbc5568b84
>    37 ?        S      0:00      |   _ /bin/bash -c      while true; do       echo hello
world       sleep 10     done
>    62 ?        S      0:00      |       _ sleep 10
>    55 ?        S      0:00      _ python ./daemon.py
> 
> ```
> 
> Now the runner is aware of the reparented procesess can can tear it down cleanly with
a `SIGTERM`.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/thermos/common/process_util.py abd2c0ef35858d13971319b0a7436ce2293824ce

>   src/main/python/apache/thermos/core/helper.py 68855e1e54ba1cd4456e18a36fb237ce6a468c34

>   src/main/python/apache/thermos/core/process.py 3ec43e2719ef97026f399c4b2aa23002559b3153

>   src/main/python/apache/thermos/core/runner.py 7b9013d11f6ff4172b6b7bf56e62299b0d11c977

>   src/test/sh/org/apache/aurora/e2e/test_daemonizing_process.aurora PRE-CREATION 
>   src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 67702d2c0f2e18ee10dcb798b6d421050bd7d4ca

> 
> Diff: https://reviews.apache.org/r/53418/diff/
> 
> 
> Testing
> -------
> 
> src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Zameer Manji
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message