Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 41D9B200C88 for ; Fri, 2 Jun 2017 20:16:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 3F2EB160BD2; Fri, 2 Jun 2017 18:16:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 85317160BBA for ; Fri, 2 Jun 2017 20:16:05 +0200 (CEST) Received: (qmail 55434 invoked by uid 500); 2 Jun 2017 18:16:04 -0000 Mailing-List: contact reviews-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: reviews@aurora.apache.org Delivered-To: mailing list reviews@aurora.apache.org Received: (qmail 55419 invoked by uid 99); 2 Jun 2017 18:16:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Jun 2017 18:16:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 113CEC028B; Fri, 2 Jun 2017 18:16:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.25 X-Spam-Level: *** X-Spam-Status: No, score=3.25 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, KAM_LOTSOFHASH=0.25, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id U06_9QwSU5vr; Fri, 2 Jun 2017 18:16:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id AC0025F36F; Fri, 2 Jun 2017 18:16:01 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 10837E00A7; Fri, 2 Jun 2017 18:16:01 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id F075CC40102; Fri, 2 Jun 2017 18:16:00 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6532423864941162566==" MIME-Version: 1.0 Subject: Re: Review Request 59733: Adding Configurable Wait Period for Graceful Shutdowns From: David McLaughlin To: Santhosh Kumar Shanmugham , David McLaughlin , Stephan Erb , Zameer Manji Cc: Aurora , Jordan Ly , Reza Motamedi , Aurora ReviewBot Date: Fri, 02 Jun 2017 18:16:00 -0000 Message-ID: <20170602181600.16397.78934@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: David McLaughlin X-ReviewGroup: Aurora X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/59733/ X-Sender: David McLaughlin References: <20170602172801.16397.73042@reviews-vm2.apache.org> In-Reply-To: <20170602172801.16397.73042@reviews-vm2.apache.org> Reply-To: David McLaughlin X-ReviewRequest-Repository: aurora archived-at: Fri, 02 Jun 2017 18:16:06 -0000 --===============6532423864941162566== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On June 2, 2017, 5:28 p.m., Zameer Manji wrote: > > I don't have comment access on the doc, so I will leave my questions here: > > > > 1. Should operators (via executor flags) be able to cap this value? That is should `STOP_TIMEOUT` be an operator flag? > > 2. Should the client cap this value and display an error message to the user? > > 3. AFAIK, a task can remain in `KILLING` forever. There is no timeout in the scheduler, as it just retries kills. If a user puts a large value here, I'm not sure tasks will actually termiante. Please add an e2e test here to confirm/deny. For (3), this is exactly what STOP_TIMEOUT in the executor is for. The issue for STOP_TIMEOUT is it is Thermos-specific and we support multiple executors. - David ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/59733/#review176802 ----------------------------------------------------------- On June 1, 2017, 11:48 p.m., Jordan Ly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/59733/ > ----------------------------------------------------------- > > (Updated June 1, 2017, 11:48 p.m.) > > > Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer Manji. > > > Bugs: AURORA-1931 > https://issues.apache.org/jira/browse/AURORA-1931 > > > Repository: aurora > > > Description > ------- > > We have some services that require more than the current 10 seconds given to > gracefully shutdown (they need to close resources, finish requests, etc). > > We would like to be able to configure the amount of time we wait between each > stage of the graceful shutdown sequence. See this [proposal](https://docs.google.com/document/d/1Sl-KWNyt1j0nIndinqfJsH3pkUY5IYXfGWyLHU2wacs/edit?usp=sharing) for a more in-depth > analysis. > > > Diffs > ----- > > src/main/python/apache/aurora/config/schema/base.py b2692a648645a195a24491e4978fb833c6c20be8 > src/main/python/apache/aurora/executor/aurora_executor.py 81461cb49ac223f3bdfa59e8c59e150a07771dea > src/main/python/apache/aurora/executor/http_lifecycle.py 9280bf29da9bda1691adbf3a4c34c4f3d4900517 > src/test/python/apache/aurora/client/cli/test_inspect.py 4a23c5984c2d093e2f53e93aec71418f84b65928 > src/test/python/apache/aurora/executor/test_http_lifecycle.py a967e3410a4d2dc2e1721f505a4d76da9209d177 > src/test/python/apache/aurora/executor/test_thermos_task_runner.py 1b92667bceabc8ea1540122477a51cb58ea2ae36 > > > Diff: https://reviews.apache.org/r/59733/diff/1/ > > > Testing > ------- > > Ran unit and integration tests. > > Created and killed jobs with varying wait_escalation_secs values on the Vagrant devcluster. > > > Thanks, > > Jordan Ly > > --===============6532423864941162566==--