Mailing-List: contact dev-help@aurora.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@aurora.apache.org
MIME-Version: 1.0
In-Reply-To: <HK2PR04MB099594EC7117E4973FB6AF5EC7E50@HK2PR04MB0995.apcprd04.prod.outlook.com>
References: <HK2PR04MB09953AB6A6B6EDF09C63438DC7E50@HK2PR04MB0995.apcprd04.prod.outlook.com>
 <CABtXzjQrjJHD8XagC_QAqP9auP8FTkKksiEJ+8LQ9pN+rXTZ2Q@mail.gmail.com>
 <CAOTkfX4PcfsAJf3P7AsvivphPxeD1ruxYr3hKFXku4hA-gXXsw@mail.gmail.com>
 <CAM+cpfciXE1j5VkXbbA2uwFto9Sekx2c=F5ancTbbipGfoVofQ@mail.gmail.com>
 <HK2PR04MB0995F42925E3733AED9A54EFC7E50@HK2PR04MB0995.apcprd04.prod.outlook.com>
 <HK2PR04MB099594EC7117E4973FB6AF5EC7E50@HK2PR04MB0995.apcprd04.prod.outlook.com>
From: Zameer Manji <zmanji@uber.com>
Date: Fri, 2 Sep 2016 17:29:49 -0700
Message-ID: <CAM+cpfd5TTYPHvoOP9gVAhR3WsbNyUo0Gf_HaUCBRMu_mYF0LA@mail.gmail.com>
Subject: =?UTF-8?Q?Re=3A_=E7=AD=94=E5=A4=8D=3A_Discussion_on_review_request_51536?=
To: =?UTF-8?B?6buEIOWHrw==?= <texasred2013@hotmail.com>
Cc: "dev@aurora.apache.org" <dev@aurora.apache.org>, Joshua Cohen <jcohen@apache.org>,
	"serb@apache.org" <serb@apache.org>, "caldima@gmail.com" <caldima@gmail.com>,
	"rdelval1@binghamton.edu" <rdelval1@binghamton.edu>
Content-Type: multipart/alternative; boundary=089e013d18a8a614e2053b8f8cbd
archived-at: Sat, 03 Sep 2016 00:30:25 -0000

--089e013d18a8a614e2053b8f8cbd
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Kai,

We have had coupled deploys before, I don't think it's too terrible. It's
something to note in the release notes and some operational pain for large
users.

On Fri, Sep 2, 2016 at 4:42 PM, =E9=BB=84 =E5=87=AF <texasred2013@hotmail.c=
om> wrote:

> Another concern is that once we rolled out the new executor, we should
> rolled out a new client in order to use the health-check feature. Hence t=
he
> executor and client rolling out process seem to be coupled.
>
>
>
>
> ------------------------------
> *=E5=8F=91=E4=BB=B6=E4=BA=BA:* =E9=BB=84 =E5=87=AF <texasred2013@hotmail.=
com>
> *=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4:* 2016=E5=B9=B49=E6=9C=883=E6=97=A5=
 7:23
> *=E6=94=B6=E4=BB=B6=E4=BA=BA:* Zameer Manji; dev@aurora.apache.org
> *=E6=8A=84=E9=80=81:* Joshua Cohen; serb@apache.org; caldima@gmail.com;
> rdelval1@binghamton.edu
> *=E4=B8=BB=E9=A2=98:* =E7=AD=94=E5=A4=8D: Discussion on review request 51=
536
>
>
> Thanks for the new proposal, Zameer. It sounds good to me. The benefit is
> that it does not alter the current infrastructure too much.
>
>
> However, there is one thing to keep in mind:
>
> we currently do a check to ensure watch_sec is longer than
> initial_interval_secs. We will have to remove the alert message if we
> choose to skip watch_sec by setting it as zero.
>
>
> So the new configuration will not support executor-driven health check
> unless the executors are rolled out 100%.
>
>
> Does this tradeoff seems OK for us, Maxim?
>
>
> Kai
>
>
> ------------------------------
> *=E5=8F=91=E4=BB=B6=E4=BA=BA:* Zameer Manji <zmanji@uber.com>
> *=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4:* 2016=E5=B9=B49=E6=9C=883=E6=97=A5=
 6:53
> *=E6=94=B6=E4=BB=B6=E4=BA=BA:* dev@aurora.apache.org
> *=E6=8A=84=E9=80=81:* =E9=BB=84 =E5=87=AF; Joshua Cohen; serb@apache.org;=
 caldima@gmail.com;
> rdelval1@binghamton.edu
> *=E4=B8=BB=E9=A2=98:* Re: Discussion on review request 51536
>
>
>
> On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <maxim@apache.org>
> wrote:
>
>> Need to correct a few previous statements:
>>
>> > Also we do not want to expose this message to users.
>> This is incorrect. The original design proposal suggested to show this
>> message in the UI as: "Task is healthy"
>>
>
> Does this mean the message in the status update is going to be exactly,
> "Task is healthy" and the scheduler is going to check for this string in
> the `TASK_RUNNING` status update? This means we are going to establish a
> communication
> mechanism between the executor and scheduler that's not defined by a
> schema. I feel that's worse than putting JSON in there and having the
> scheduler parse it.
>
>
>> > The Mesos API isn't designed for packing arbitrary data
>> > in the status update message.
>> Don't think I agree, this is exactly what this field is for [1] and we
>> already use it for other states [2].
>>
>
> I guess I should have said 'structured arbitrary data'. The informational=
,
> messages are fine and we plumb them blindly into our logging and UI. I'm
> not convinced we should start putting JSON or something more structured i=
n
> there. That's yet another schema we have and yet another versioning story
> we have to go though. This also complicates matters for custom executor
> authors.
>
>
>>
>> > I would be open to just saying that scheduler version
>> > 0.16 (or 0.17) just assumes the executor transitions to
>> > RUNNING once a task is healthy and dropping
>> > `watch_secs`entirely.
>> We can't drop 'watch_secs' entirely as we still have to babysit job
>> updates that don't have health checks enabled.
>>
>
> Understood. I guess we can keep it but I'm now frustrated that we have a
> parameter that is ignored if we set some json in ExecutorConfig.data.
> Ideally, we don't accept `watch_secs` if we want health check driven
> updates. As mentioned before I don't like this implicit tightening of the
> executor and the scheduler.
>
>
>>
>> As for my take on the above, I favor #1 as the simplest answer to an
>> already simple question: "Should we use watch_secs for this instance
>> or not?". That's pretty much it. Scheduler does not need any schema
>> changes, know what health checks are or if a job has them enabled. At
>> least not until we attempt to move to centralized health checks
>> (AURORA-279) but that will be an entirely different design discussion.
>>
>> [1] - https://github.com/apache/mesos/blob/master/include/mesos/
>> mesos.proto#L1605.
>> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a456
>> 3cfcff0442b7bf8383/src/main/python/apache/aurora/executor/
>> aurora_executor.py#L97
>
>
>
> With all of this in mind, I have another proposal. Why can't we have the
> executor changes (wait until the task is healthy for RUNNING) *and* read
> `watch_secs` if it is set? Why not have both of these features and if use=
rs
> want purely health checking driven updates they can set this value to 0 a=
nd
> enable health checks. If they want to have both health checking and time
> driven updates they can set this to value to the time that they care abou=
t.
> If they just want time driven updates they can disable health checking an=
d
> set this value.
>
> Then there is no coupling between the executor and the scheduler except
> for status updates and there is no dependency on the `message` field of t=
he
> status update.
>
> We could even treat `watch_secs` as minimum time in STARTING + RUNNING
> instead of RUNNING with this change and it becomes the lower bound in the
> update transition speed. This can ensure that users don't deploy "too fas=
t"
> and end up overwhelming other services if they are deployed too quickly.
>
>
>
>>
>>
>> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zmanji@apache.org> wrote:
>> > *cc: Renan*
>> >
>> > I think there is some disagreement/discussion on the review because we
>> have
>> > not achieved consensus on the design. Since the design doc was written=
,
>> > Aurora adopted multiple executor support as well non HTTP based
>> > healthchecking. This invalidates some parts of the original design. I
>> think
>> > all of the solutions here are possible amendments to the design doc.
>> >
>> > I am not in favor of Solution 2 at all because status updates between
>> > executor <-> agent <-> master <-> scheduler are designed to update the
>> > framework of updates to the task and not really designed to send
>> arbitrary
>> > information. Just because the Mesos API provides us with a string fiel=
d
>> > doesn't mean we should try to pack in arbitrary data. Also, it isn't
>> clear
>> > what other capabilities we might add in the future so I'm unconvinced
>> that
>> > capabilities needs to exist at all. My fear is that we will create the
>> > infrastructure for capabilities just to serve this need and nothing
>> else.
>> >
>> > I object to Solution 1 along the same lines. The Mesos API isn't
>> designed
>> > for packing arbitrary data in the status update message and I don't
>> think
>> > we should abuse that and rely on that. Also our current infrastructure
>> just
>> > plumbs the message to the UI and I think displaying capabilities is no=
t
>> > something we should do.
>> >
>> > I am in favor of Solution 3 which is as close as possible to the
>> original
>> > design in the design doc. The design doc says the following:
>> >
>> > Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
>> >> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd802
>> 25a3987b7cc7a8389a2/docs/configuration-reference.md#updateconfig-objects
>> >)
>> >> grace period any time it detects a named port =E2=80=98health=E2=80=
=99 in task
>> >> configuration. A RUNNING instance status will signify the end of
>> instance
>> >> update.
>> >
>> >
>> > Instead of detecting the 'health' port in the task configuration, we
>> make
>> > enabling this feature explicitly by enabling a bit in the task
>> > configuration with the `executorDrivenUpdates` bit.
>> >
>> > I understand this option makes this feature more complex because it
>> > requires a schema change and requires operators to deploy the executor
>> to
>> > all agents before upgrading the client. However, I think that's a one
>> time
>> > operational cost as a opposed to long lived design choices that will
>> affect
>> > the code.
>> >
>> > Further Solution 3 is the most amenable to custom executors and
>> continues
>> > our tradition of treating executors as opaque black boxes. I think
>> there is
>> > a lot of value in treating executors as black boxes as it leaves the
>> door
>> > open to switching our executor to something else and doesn't impose a
>> > burden to others that want to write their own.
>> >
>> > Alternatively, if amending the schema is too much work, I would be ope=
n
>> to
>> > just saying that scheduler version 0.16 (or 0.17) just assumes the
>> executor
>> > transitions to RUNNING once a task is healthy and dropping `watch_secs=
`
>> > entirely. We can put it in the release notes that operators must deplo=
y
>> the
>> > executor to 100% before deploying the scheduler.
>> >
>> >
>> > On Thu, Sep 1, 2016 at 6:40 PM, =E9=BB=84 =E5=87=AF <texasred2013@hotm=
ail.com> wrote:
>> >
>> >> Hi Folks,
>> >>
>> >> I'm currently working on a feature on aurora scheduler and executor.
>> The
>> >> implementation strategy became controversial on the review board, so =
I
>> was
>> >> wondering if I should broadcast it to more audience and initiate a
>> >> discussion. Please feel free to let me know your thoughts, your help =
is
>> >> greatly appreciated!
>> >>
>> >> The high level goal of this feature is to improve reliability and
>> >> performance of the Aurora scheduler job updater, by relying on health
>> check
>> >> status rather than watch_secs timeout when deciding an individual
>> instance
>> >> update state.
>> >>
>> >> Please see the original review request *https://reviews.apache.org/r/
>> 51536/
>> >> <https://reviews.apache.org/r/51536/> *
>> >> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
>> >> <https://issues.apache.org/jira/browse/AURORA-894>*
>> >> design doc *https://docs.google.com/docum
>> ent/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>> >> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm
>> 10NXSxEWR0a-21FP5d94/edit#>*
>> >> for more details and background.
>> >>
>> >> Note: The design doc becomes a little bit outdated on the "scheduler
>> >> change summary" part (this is what the review request trying to
>> address).
>> >> As a result, I've left some comment to clarify the latest proposed
>> >> implementation plan for scheduler change.
>> >>
>> >> There are two questions I'm trying to address here:
>> >> *1. How does the scheduler infer the executor version and be backward
>> >> compatible?*
>> >> *2. Where do we determine if health check is enabled?*
>> >>
>> >> In short, there are 3 different solutions proposed on the review boar=
d.
>> >>
>> >> In the first two approaches, the scheduler will rely on a string to
>> >> determine the executor version. We determine whether health check is
>> >> enabled merely on executor side. There will be communication between
>> the
>> >> executor and the scheduler.
>> >> *Solution 1: *
>> >> *vCurrent executor sends a message in its health check thread during
>> >> RUNNING state transition, and the vCurrent updater will infer the
>> executor
>> >> version from the presence of this message, and skip the watch_secs if
>> >> necessary.*
>> >>
>> >> *Solution 2:*
>> >> *Instead of relying on the presence of an arbitrary string in the
>> message,
>> >> rely on the presence of a string like:
>> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
>> >> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically ju=
st
>> >> formalizing the mechanism and making it a bit more future proof.*
>> >>
>> >> In the third solution, the scheduler infers the executor version from
>> the
>> >> JobUpdateSettings on scheduler side.
>> >> *Solution 3:*
>> >> *Adding a bit to JobUpdateSettings which is =E2=80=98executorDrivenUp=
dates', if
>> >> that is set, the scheduler assumes that the transition from STARTING =
->
>> >> RUNNING makes the executor healthy and concurrently, we release
>> thermos and
>> >> change HealthCheckConfig to say that it should only go to running aft=
er
>> >> healthy*.
>> >>
>> >> *Pros and Cons:*
>> >> The main benefit of Solution 1 is:
>> >> 1. By using the message in task status update, we don't have to make
>> any
>> >> schema change, which makes the design simple.
>> >> 2. The feature is fully backward-compatible. When we roll out the
>> vCurrent
>> >> schedulers and executors, we do not have to instruct the users to
>> provide
>> >> additional field in the Job or Update configs, which could confuses
>> >> customers when the vPrev and vCurrent executor coexist in the cluster=
.
>> >>
>> >> Concerns:
>> >> Relying on the presence of a message makes things brittle. Also we do
>> not
>> >> want to expose this message to users.
>> >>
>> >> The benefit of Solution 2 is making the feature more future proof.
>> >> However, if we do not envision a new executor feature in the short
>> term,
>> >> it's not too much different from Solution 1.
>> >>
>> >> The benefits of Solution 3 include:
>> >> 1. We support more than just thermos now (and others rely on custom
>> >> executors).
>> >> 2. A lot of things in Aurora treat the executor as opaque. The status
>> >> update message sent by executor should not be visible to users only i=
f
>> it's
>> >> an error message.
>> >>
>> >> Concerns:
>> >> 1. In addition to the =E2=80=98executorDrivenUpdates' bit that identi=
fies the
>> >> executor version, we still need to notify the scheduler if health
>> check is
>> >> enabled on vCurrent executor, if not, the scheduler must be able to
>> fall
>> >> back to use watch_secs.
>> >> 2. The users have to provide an additional field in their .aurora
>> config
>> >> files. The feature wouldn't be available unless new clients are rolle=
d
>> out
>> >> as well.
>> >>
>> >> Please let me know if I understand your suggestions correctly and
>> >> hopefully everyone is on the same page!
>> >>
>> >> Thanks,
>> >>
>> >> Kai
>> >>
>>
>
>
>
> --
> Zameer Manji
>


--=20
Zameer Manji

--089e013d18a8a614e2053b8f8cbd--