Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 06B89200B83 for ; Sat, 3 Sep 2016 02:30:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 051ED160ACB; Sat, 3 Sep 2016 00:30:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C9247160A8C for ; Sat, 3 Sep 2016 02:30:23 +0200 (CEST) Received: (qmail 24615 invoked by uid 500); 3 Sep 2016 00:30:23 -0000 Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@aurora.apache.org Delivered-To: mailing list dev@aurora.apache.org Received: (qmail 24595 invoked by uid 99); 3 Sep 2016 00:30:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 03 Sep 2016 00:30:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CF0B1C1FD6 for ; Sat, 3 Sep 2016 00:30:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=uber.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id d0TcawcAR9NQ for ; Sat, 3 Sep 2016 00:30:19 +0000 (UTC) Received: from mail-wm0-f46.google.com (mail-wm0-f46.google.com [74.125.82.46]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E71B15F36A for ; Sat, 3 Sep 2016 00:30:18 +0000 (UTC) Received: by mail-wm0-f46.google.com with SMTP id c133so50469699wmd.1 for ; Fri, 02 Sep 2016 17:30:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uber.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=76XGUZO/6UFJsJn8ZUScp+mjd1K0Rj89rxtBuG9OYhE=; b=aLwlT1uOutFQrhMozI6lcn5bw3Pe5JaanWXkU+m9Bk1TW19UVYJu57zR4QgjbwcfIy CK1+NQx+tyX8xUjA5jzHp+mCgH9tkuYvK6fDzlIWcJbi3NCc4aGCqrUTdRTVQcgHgcXi 1JtXTvs251MYu5RCKoFk3vzKvjKtbzPL4bxLs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=76XGUZO/6UFJsJn8ZUScp+mjd1K0Rj89rxtBuG9OYhE=; b=G8Fx+eLWWSBj+1WxvY5CCYdTjs3DHbiq/lNjZFLdr+77fC+pZNoYstFikS5ZaJ5jW1 Pb2hlNMq/svlFi6z0hICExXEtDtNuMg3/Md9LHuqLNiNml++0NT07Hs7YCH8DCyk510t HPGaHeNifz1Ucuz1C0wXxMvAv645YjnSUipaDqRV892ghcq/Ugwtkl6aAcJ4A6G9DP/D 6g5nSEFUhFD8LiYfsnpL41P2+k6SiDXuQeY/VA8RGCC4sztpf8Vjedxz3i2wu7By3CM8 sY+X2qHd14tMuZLak/9nYAQ2aicjAxzDY6JyFBqxZFtKNdPEFra4bEGulgsPwr7nB9Vr 1+Kw== X-Gm-Message-State: AE9vXwOrn74Gd+lTyvEDRF5yLUANCO4mJvHrRObPNEmne1nNKNKRZcpJWzFZDzstzInzo4MmdntX2b5HLapSD/e3 X-Received: by 10.194.175.170 with SMTP id cb10mr23595738wjc.17.1472862609560; Fri, 02 Sep 2016 17:30:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.80.137.137 with HTTP; Fri, 2 Sep 2016 17:29:49 -0700 (PDT) In-Reply-To: References: From: Zameer Manji Date: Fri, 2 Sep 2016 17:29:49 -0700 Message-ID: Subject: =?UTF-8?Q?Re=3A_=E7=AD=94=E5=A4=8D=3A_Discussion_on_review_request_51536?= To: =?UTF-8?B?6buEIOWHrw==?= Cc: "dev@aurora.apache.org" , Joshua Cohen , "serb@apache.org" , "caldima@gmail.com" , "rdelval1@binghamton.edu" Content-Type: multipart/alternative; boundary=089e013d18a8a614e2053b8f8cbd archived-at: Sat, 03 Sep 2016 00:30:25 -0000 --089e013d18a8a614e2053b8f8cbd Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Kai, We have had coupled deploys before, I don't think it's too terrible. It's something to note in the release notes and some operational pain for large users. On Fri, Sep 2, 2016 at 4:42 PM, =E9=BB=84 =E5=87=AF wrote: > Another concern is that once we rolled out the new executor, we should > rolled out a new client in order to use the health-check feature. Hence t= he > executor and client rolling out process seem to be coupled. > > > > > ------------------------------ > *=E5=8F=91=E4=BB=B6=E4=BA=BA:* =E9=BB=84 =E5=87=AF > *=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4:* 2016=E5=B9=B49=E6=9C=883=E6=97=A5= 7:23 > *=E6=94=B6=E4=BB=B6=E4=BA=BA:* Zameer Manji; dev@aurora.apache.org > *=E6=8A=84=E9=80=81:* Joshua Cohen; serb@apache.org; caldima@gmail.com; > rdelval1@binghamton.edu > *=E4=B8=BB=E9=A2=98:* =E7=AD=94=E5=A4=8D: Discussion on review request 51= 536 > > > Thanks for the new proposal, Zameer. It sounds good to me. The benefit is > that it does not alter the current infrastructure too much. > > > However, there is one thing to keep in mind: > > we currently do a check to ensure watch_sec is longer than > initial_interval_secs. We will have to remove the alert message if we > choose to skip watch_sec by setting it as zero. > > > So the new configuration will not support executor-driven health check > unless the executors are rolled out 100%. > > > Does this tradeoff seems OK for us, Maxim? > > > Kai > > > ------------------------------ > *=E5=8F=91=E4=BB=B6=E4=BA=BA:* Zameer Manji > *=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4:* 2016=E5=B9=B49=E6=9C=883=E6=97=A5= 6:53 > *=E6=94=B6=E4=BB=B6=E4=BA=BA:* dev@aurora.apache.org > *=E6=8A=84=E9=80=81:* =E9=BB=84 =E5=87=AF; Joshua Cohen; serb@apache.org;= caldima@gmail.com; > rdelval1@binghamton.edu > *=E4=B8=BB=E9=A2=98:* Re: Discussion on review request 51536 > > > > On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko > wrote: > >> Need to correct a few previous statements: >> >> > Also we do not want to expose this message to users. >> This is incorrect. The original design proposal suggested to show this >> message in the UI as: "Task is healthy" >> > > Does this mean the message in the status update is going to be exactly, > "Task is healthy" and the scheduler is going to check for this string in > the `TASK_RUNNING` status update? This means we are going to establish a > communication > mechanism between the executor and scheduler that's not defined by a > schema. I feel that's worse than putting JSON in there and having the > scheduler parse it. > > >> > The Mesos API isn't designed for packing arbitrary data >> > in the status update message. >> Don't think I agree, this is exactly what this field is for [1] and we >> already use it for other states [2]. >> > > I guess I should have said 'structured arbitrary data'. The informational= , > messages are fine and we plumb them blindly into our logging and UI. I'm > not convinced we should start putting JSON or something more structured i= n > there. That's yet another schema we have and yet another versioning story > we have to go though. This also complicates matters for custom executor > authors. > > >> >> > I would be open to just saying that scheduler version >> > 0.16 (or 0.17) just assumes the executor transitions to >> > RUNNING once a task is healthy and dropping >> > `watch_secs`entirely. >> We can't drop 'watch_secs' entirely as we still have to babysit job >> updates that don't have health checks enabled. >> > > Understood. I guess we can keep it but I'm now frustrated that we have a > parameter that is ignored if we set some json in ExecutorConfig.data. > Ideally, we don't accept `watch_secs` if we want health check driven > updates. As mentioned before I don't like this implicit tightening of the > executor and the scheduler. > > >> >> As for my take on the above, I favor #1 as the simplest answer to an >> already simple question: "Should we use watch_secs for this instance >> or not?". That's pretty much it. Scheduler does not need any schema >> changes, know what health checks are or if a job has them enabled. At >> least not until we attempt to move to centralized health checks >> (AURORA-279) but that will be an entirely different design discussion. >> >> [1] - https://github.com/apache/mesos/blob/master/include/mesos/ >> mesos.proto#L1605. >> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a456 >> 3cfcff0442b7bf8383/src/main/python/apache/aurora/executor/ >> aurora_executor.py#L97 > > > > With all of this in mind, I have another proposal. Why can't we have the > executor changes (wait until the task is healthy for RUNNING) *and* read > `watch_secs` if it is set? Why not have both of these features and if use= rs > want purely health checking driven updates they can set this value to 0 a= nd > enable health checks. If they want to have both health checking and time > driven updates they can set this to value to the time that they care abou= t. > If they just want time driven updates they can disable health checking an= d > set this value. > > Then there is no coupling between the executor and the scheduler except > for status updates and there is no dependency on the `message` field of t= he > status update. > > We could even treat `watch_secs` as minimum time in STARTING + RUNNING > instead of RUNNING with this change and it becomes the lower bound in the > update transition speed. This can ensure that users don't deploy "too fas= t" > and end up overwhelming other services if they are deployed too quickly. > > > >> >> >> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji wrote: >> > *cc: Renan* >> > >> > I think there is some disagreement/discussion on the review because we >> have >> > not achieved consensus on the design. Since the design doc was written= , >> > Aurora adopted multiple executor support as well non HTTP based >> > healthchecking. This invalidates some parts of the original design. I >> think >> > all of the solutions here are possible amendments to the design doc. >> > >> > I am not in favor of Solution 2 at all because status updates between >> > executor <-> agent <-> master <-> scheduler are designed to update the >> > framework of updates to the task and not really designed to send >> arbitrary >> > information. Just because the Mesos API provides us with a string fiel= d >> > doesn't mean we should try to pack in arbitrary data. Also, it isn't >> clear >> > what other capabilities we might add in the future so I'm unconvinced >> that >> > capabilities needs to exist at all. My fear is that we will create the >> > infrastructure for capabilities just to serve this need and nothing >> else. >> > >> > I object to Solution 1 along the same lines. The Mesos API isn't >> designed >> > for packing arbitrary data in the status update message and I don't >> think >> > we should abuse that and rely on that. Also our current infrastructure >> just >> > plumbs the message to the UI and I think displaying capabilities is no= t >> > something we should do. >> > >> > I am in favor of Solution 3 which is as close as possible to the >> original >> > design in the design doc. The design doc says the following: >> > >> > Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs >> >> > 25a3987b7cc7a8389a2/docs/configuration-reference.md#updateconfig-objects >> >) >> >> grace period any time it detects a named port =E2=80=98health=E2=80= =99 in task >> >> configuration. A RUNNING instance status will signify the end of >> instance >> >> update. >> > >> > >> > Instead of detecting the 'health' port in the task configuration, we >> make >> > enabling this feature explicitly by enabling a bit in the task >> > configuration with the `executorDrivenUpdates` bit. >> > >> > I understand this option makes this feature more complex because it >> > requires a schema change and requires operators to deploy the executor >> to >> > all agents before upgrading the client. However, I think that's a one >> time >> > operational cost as a opposed to long lived design choices that will >> affect >> > the code. >> > >> > Further Solution 3 is the most amenable to custom executors and >> continues >> > our tradition of treating executors as opaque black boxes. I think >> there is >> > a lot of value in treating executors as black boxes as it leaves the >> door >> > open to switching our executor to something else and doesn't impose a >> > burden to others that want to write their own. >> > >> > Alternatively, if amending the schema is too much work, I would be ope= n >> to >> > just saying that scheduler version 0.16 (or 0.17) just assumes the >> executor >> > transitions to RUNNING once a task is healthy and dropping `watch_secs= ` >> > entirely. We can put it in the release notes that operators must deplo= y >> the >> > executor to 100% before deploying the scheduler. >> > >> > >> > On Thu, Sep 1, 2016 at 6:40 PM, =E9=BB=84 =E5=87=AF wrote: >> > >> >> Hi Folks, >> >> >> >> I'm currently working on a feature on aurora scheduler and executor. >> The >> >> implementation strategy became controversial on the review board, so = I >> was >> >> wondering if I should broadcast it to more audience and initiate a >> >> discussion. Please feel free to let me know your thoughts, your help = is >> >> greatly appreciated! >> >> >> >> The high level goal of this feature is to improve reliability and >> >> performance of the Aurora scheduler job updater, by relying on health >> check >> >> status rather than watch_secs timeout when deciding an individual >> instance >> >> update state. >> >> >> >> Please see the original review request *https://reviews.apache.org/r/ >> 51536/ >> >> * >> >> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894 >> >> * >> >> design doc *https://docs.google.com/docum >> ent/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit# >> >> > 10NXSxEWR0a-21FP5d94/edit#>* >> >> for more details and background. >> >> >> >> Note: The design doc becomes a little bit outdated on the "scheduler >> >> change summary" part (this is what the review request trying to >> address). >> >> As a result, I've left some comment to clarify the latest proposed >> >> implementation plan for scheduler change. >> >> >> >> There are two questions I'm trying to address here: >> >> *1. How does the scheduler infer the executor version and be backward >> >> compatible?* >> >> *2. Where do we determine if health check is enabled?* >> >> >> >> In short, there are 3 different solutions proposed on the review boar= d. >> >> >> >> In the first two approaches, the scheduler will rely on a string to >> >> determine the executor version. We determine whether health check is >> >> enabled merely on executor side. There will be communication between >> the >> >> executor and the scheduler. >> >> *Solution 1: * >> >> *vCurrent executor sends a message in its health check thread during >> >> RUNNING state transition, and the vCurrent updater will infer the >> executor >> >> version from the presence of this message, and skip the watch_secs if >> >> necessary.* >> >> >> >> *Solution 2:* >> >> *Instead of relying on the presence of an arbitrary string in the >> message, >> >> rely on the presence of a string like: >> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and >> >> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically ju= st >> >> formalizing the mechanism and making it a bit more future proof.* >> >> >> >> In the third solution, the scheduler infers the executor version from >> the >> >> JobUpdateSettings on scheduler side. >> >> *Solution 3:* >> >> *Adding a bit to JobUpdateSettings which is =E2=80=98executorDrivenUp= dates', if >> >> that is set, the scheduler assumes that the transition from STARTING = -> >> >> RUNNING makes the executor healthy and concurrently, we release >> thermos and >> >> change HealthCheckConfig to say that it should only go to running aft= er >> >> healthy*. >> >> >> >> *Pros and Cons:* >> >> The main benefit of Solution 1 is: >> >> 1. By using the message in task status update, we don't have to make >> any >> >> schema change, which makes the design simple. >> >> 2. The feature is fully backward-compatible. When we roll out the >> vCurrent >> >> schedulers and executors, we do not have to instruct the users to >> provide >> >> additional field in the Job or Update configs, which could confuses >> >> customers when the vPrev and vCurrent executor coexist in the cluster= . >> >> >> >> Concerns: >> >> Relying on the presence of a message makes things brittle. Also we do >> not >> >> want to expose this message to users. >> >> >> >> The benefit of Solution 2 is making the feature more future proof. >> >> However, if we do not envision a new executor feature in the short >> term, >> >> it's not too much different from Solution 1. >> >> >> >> The benefits of Solution 3 include: >> >> 1. We support more than just thermos now (and others rely on custom >> >> executors). >> >> 2. A lot of things in Aurora treat the executor as opaque. The status >> >> update message sent by executor should not be visible to users only i= f >> it's >> >> an error message. >> >> >> >> Concerns: >> >> 1. In addition to the =E2=80=98executorDrivenUpdates' bit that identi= fies the >> >> executor version, we still need to notify the scheduler if health >> check is >> >> enabled on vCurrent executor, if not, the scheduler must be able to >> fall >> >> back to use watch_secs. >> >> 2. The users have to provide an additional field in their .aurora >> config >> >> files. The feature wouldn't be available unless new clients are rolle= d >> out >> >> as well. >> >> >> >> Please let me know if I understand your suggestions correctly and >> >> hopefully everyone is on the same page! >> >> >> >> Thanks, >> >> >> >> Kai >> >> >> > > > > -- > Zameer Manji > --=20 Zameer Manji --089e013d18a8a614e2053b8f8cbd--