Return-Path: X-Original-To: apmail-mesos-issues-archive@minotaur.apache.org Delivered-To: apmail-mesos-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D30CE114DB for ; Mon, 28 Jul 2014 23:19:39 +0000 (UTC) Received: (qmail 14238 invoked by uid 500); 28 Jul 2014 23:19:39 -0000 Delivered-To: apmail-mesos-issues-archive@mesos.apache.org Received: (qmail 14052 invoked by uid 500); 28 Jul 2014 23:19:39 -0000 Mailing-List: contact issues-help@mesos.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mesos.apache.org Delivered-To: mailing list issues@mesos.apache.org Received: (qmail 13871 invoked by uid 99); 28 Jul 2014 23:19:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jul 2014 23:19:39 +0000 Date: Mon, 28 Jul 2014 23:19:39 +0000 (UTC) From: "Bernd Mathiske (JIRA)" To: issues@mesos.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MESOS-1199) Subprocess is "slow" -> gated by process::reap poll interval MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MESOS-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077114#comment-14077114 ] Bernd Mathiske commented on MESOS-1199: --------------------------------------- Idea: 1. Iterate over the pids of interest, calling kill(pid, 0) on each pid. This returns immediately and reports if the process is alive. 2. Wait for a small timeout (100ms?) 3. Repeat. This way, the wait time is a small constant plus n times the overhead of kill(). > Subprocess is "slow" -> gated by process::reap poll interval > ------------------------------------------------------------ > > Key: MESOS-1199 > URL: https://issues.apache.org/jira/browse/MESOS-1199 > Project: Mesos > Issue Type: Improvement > Affects Versions: 0.18.0 > Reporter: Ian Downes > Assignee: Craig Hansen-Sturm > > Subprocess uses process::reap to wait on the subprocess pid and set the exit status. However, process::reap polls with a one second interval resulting in a delay up to the interval duration before the status future is set. > This means if you need to wait for the subprocess to complete you get hit with E(delay) = 0.5 seconds, independent of the execution time. For example, the MesosContainerizer uses mesos-fetcher in a Subprocess to fetch the executor during launch. At Twitter we fetch a local file, i.e., a very fast operation, but the launch is blocked until the mesos-fetcher pid is reaped -> adding 0 to 1 seconds for every launch! > The problem is even worse with a chain of short Subprocesses because after the first Subprocess completes you'll be synchronized with the reap interval and you'll see nearly the full interval before notification, i.e., 10 Subprocesses each of << 1 second duration with take ~10 seconds! > This has become particularly apparent in some new tests I'm working on where test durations are now greatly extended with each taking several seconds. -- This message was sent by Atlassian JIRA (v6.2#6252)