mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nikita Vetoshkin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MESOS-1199) Subprocess is "slow" -> gated by process::reap poll interval
Date Wed, 06 Aug 2014 06:31:13 GMT

    [ https://issues.apache.org/jira/browse/MESOS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087322#comment-14087322
] 

Nikita Vetoshkin edited comment on MESOS-1199 at 8/6/14 6:30 AM:
-----------------------------------------------------------------

Just a quick note: polling pid of non-children is a racy deal. Process can die and a new one
unrelated with the same pid can spin up in between poll attempts.
I wonder if we could extend executors protocol - e.g. ask executor to bind specified Unix
Domain socket. Thisi socket can be polled, reconnected and slave will receive disconnect when
executor dies. Any thoughts?


was (Author: nekto0n):
Just a quick note: polling pid of non-children is a racy deal. Process can die and a new one
unrelated with the same pid can spin up in between poll attempts.
I wonder if we could extend executors protocol - e.g. to bind specified Unix Domain sockets.
They can be polled, reconnected and slave will receive disconnect when executor dies. Any
thoughts?

> Subprocess is "slow" -> gated by process::reap poll interval
> ------------------------------------------------------------
>
>                 Key: MESOS-1199
>                 URL: https://issues.apache.org/jira/browse/MESOS-1199
>             Project: Mesos
>          Issue Type: Improvement
>    Affects Versions: 0.18.0
>            Reporter: Ian Downes
>            Assignee: Craig Hansen-Sturm
>         Attachments: wiatpid.pdf
>
>
> Subprocess uses process::reap to wait on the subprocess pid and set the exit status.
However, process::reap polls with a one second interval resulting in a delay up to the interval
duration before the status future is set.
> This means if you need to wait for the subprocess to complete you get hit with E(delay)
= 0.5 seconds, independent of the execution time. For example, the MesosContainerizer uses
mesos-fetcher in a Subprocess to fetch the executor during launch. At Twitter we fetch a local
file, i.e., a very fast operation, but the launch is blocked until the mesos-fetcher pid is
reaped -> adding 0 to 1 seconds for every launch!
> The problem is even worse with a chain of short Subprocesses because after the first
Subprocess completes you'll be synchronized with the reap interval and you'll see nearly the
full interval before notification, i.e., 10 Subprocesses each of << 1 second duration
with take ~10 seconds!
> This has become particularly apparent in some new tests I'm working on where test durations
are now greatly extended with each taking several seconds.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message