aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: 答复: Discussion on review request 51536
Date Sat, 03 Sep 2016 01:36:44 GMT
Just to summarize Zameer's proposal buried deep in the quotation
stream: we keep watch_secs but let users set it to 0 iff they have
health checks enabled. No scheduler (updater) changes needed at all.
Users will need to opt-in to use the new feature by changing
watch_secs to 0 in their configs (or skipping it completely, which
will set the default to 0 automatically).

I am +1 to this idea. Thanks Zameer!

On Fri, Sep 2, 2016 at 6:28 PM, Zameer Manji <zmanji@apache.org> wrote:
> Resending my emails from a domain that has mailing list friendly DMARC
> settings.
>
> It seems that we have achieved some consensus on the design, others feel
> free to weigh in.
>
> On Fri, Sep 2, 2016 at 5:29 PM, Zameer Manji <zmanji@uber.com> wrote:
>
>> Kai,
>>
>> We have had coupled deploys before, I don't think it's too terrible. It's
>> something to note in the release notes and some operational pain for large
>> users.
>>
>> On Fri, Sep 2, 2016 at 4:42 PM, 黄 凯 <texasred2013@hotmail.com> wrote:
>>
>> > Another concern is that once we rolled out the new executor, we should
>> > rolled out a new client in order to use the health-check feature. Hence
>> the
>> > executor and client rolling out process seem to be coupled.
>> >
>> >
>> >
>> >
>> > ------------------------------
>> > *发件人:* 黄 凯 <texasred2013@hotmail.com>
>> > *发送时间:* 2016年9月3日 7:23
>> > *收件人:* Zameer Manji; dev@aurora.apache.org
>> > *抄送:* Joshua Cohen; serb@apache.org; caldima@gmail.com;
>> > rdelval1@binghamton.edu
>> > *主题:* 答复: Discussion on review request 51536
>> >
>> >
>> > Thanks for the new proposal, Zameer. It sounds good to me. The benefit is
>> > that it does not alter the current infrastructure too much.
>> >
>> >
>> > However, there is one thing to keep in mind:
>> >
>> > we currently do a check to ensure watch_sec is longer than
>> > initial_interval_secs. We will have to remove the alert message if we
>> > choose to skip watch_sec by setting it as zero.
>> >
>> >
>> > So the new configuration will not support executor-driven health check
>> > unless the executors are rolled out 100%.
>> >
>> >
>> > Does this tradeoff seems OK for us, Maxim?
>> >
>> >
>> > Kai
>> >
>> >
>> > ------------------------------
>> > *发件人:* Zameer Manji <zmanji@uber.com>
>> > *发送时间:* 2016年9月3日 6:53
>> > *收件人:* dev@aurora.apache.org
>> > *抄送:* 黄 凯; Joshua Cohen; serb@apache.org; caldima@gmail.com;
>> > rdelval1@binghamton.edu
>> > *主题:* Re: Discussion on review request 51536
>> >
>> >
>> >
>> > On Fri, Sep 2, 2016 at 3:24 PM, Maxim Khutornenko <maxim@apache.org>
>> > wrote:
>> >
>> >> Need to correct a few previous statements:
>> >>
>> >> > Also we do not want to expose this message to users.
>> >> This is incorrect. The original design proposal suggested to show this
>> >> message in the UI as: "Task is healthy"
>> >>
>> >
>> > Does this mean the message in the status update is going to be exactly,
>> > "Task is healthy" and the scheduler is going to check for this string in
>> > the `TASK_RUNNING` status update? This means we are going to establish a
>> > communication
>> > mechanism between the executor and scheduler that's not defined by a
>> > schema. I feel that's worse than putting JSON in there and having the
>> > scheduler parse it.
>> >
>> >
>> >> > The Mesos API isn't designed for packing arbitrary data
>> >> > in the status update message.
>> >> Don't think I agree, this is exactly what this field is for [1] and we
>> >> already use it for other states [2].
>> >>
>> >
>> > I guess I should have said 'structured arbitrary data'. The
>> informational,
>> > messages are fine and we plumb them blindly into our logging and UI. I'm
>> > not convinced we should start putting JSON or something more structured
>> in
>> > there. That's yet another schema we have and yet another versioning story
>> > we have to go though. This also complicates matters for custom executor
>> > authors.
>> >
>> >
>> >>
>> >> > I would be open to just saying that scheduler version
>> >> > 0.16 (or 0.17) just assumes the executor transitions to
>> >> > RUNNING once a task is healthy and dropping
>> >> > `watch_secs`entirely.
>> >> We can't drop 'watch_secs' entirely as we still have to babysit job
>> >> updates that don't have health checks enabled.
>> >>
>> >
>> > Understood. I guess we can keep it but I'm now frustrated that we have a
>> > parameter that is ignored if we set some json in ExecutorConfig.data.
>> > Ideally, we don't accept `watch_secs` if we want health check driven
>> > updates. As mentioned before I don't like this implicit tightening of the
>> > executor and the scheduler.
>> >
>> >
>> >>
>> >> As for my take on the above, I favor #1 as the simplest answer to an
>> >> already simple question: "Should we use watch_secs for this instance
>> >> or not?". That's pretty much it. Scheduler does not need any schema
>> >> changes, know what health checks are or if a job has them enabled. At
>> >> least not until we attempt to move to centralized health checks
>> >> (AURORA-279) but that will be an entirely different design discussion.
>> >>
>> >> [1] - https://github.com/apache/mesos/blob/master/include/mesos/
>> >> mesos.proto#L1605.
>> >> [2] - https://github.com/apache/aurora/blob/5cad046fc0f0c4bb79a456
>> >> 3cfcff0442b7bf8383/src/main/python/apache/aurora/executor/
>> >> aurora_executor.py#L97
>> >
>> >
>> >
>> > With all of this in mind, I have another proposal. Why can't we have the
>> > executor changes (wait until the task is healthy for RUNNING) *and* read
>> > `watch_secs` if it is set? Why not have both of these features and if
>> users
>> > want purely health checking driven updates they can set this value to 0
>> and
>> > enable health checks. If they want to have both health checking and time
>> > driven updates they can set this to value to the time that they care
>> about.
>> > If they just want time driven updates they can disable health checking
>> and
>> > set this value.
>> >
>> > Then there is no coupling between the executor and the scheduler except
>> > for status updates and there is no dependency on the `message` field of
>> the
>> > status update.
>> >
>> > We could even treat `watch_secs` as minimum time in STARTING + RUNNING
>> > instead of RUNNING with this change and it becomes the lower bound in the
>> > update transition speed. This can ensure that users don't deploy "too
>> fast"
>> > and end up overwhelming other services if they are deployed too quickly.
>> >
>> >
>> >
>> >>
>> >>
>> >> On Fri, Sep 2, 2016 at 2:26 PM, Zameer Manji <zmanji@apache.org> wrote:
>> >> > *cc: Renan*
>> >> >
>> >> > I think there is some disagreement/discussion on the review because
we
>> >> have
>> >> > not achieved consensus on the design. Since the design doc was
>> written,
>> >> > Aurora adopted multiple executor support as well non HTTP based
>> >> > healthchecking. This invalidates some parts of the original design.
I
>> >> think
>> >> > all of the solutions here are possible amendments to the design doc.
>> >> >
>> >> > I am not in favor of Solution 2 at all because status updates between
>> >> > executor <-> agent <-> master <-> scheduler are designed
to update the
>> >> > framework of updates to the task and not really designed to send
>> >> arbitrary
>> >> > information. Just because the Mesos API provides us with a string
>> field
>> >> > doesn't mean we should try to pack in arbitrary data. Also, it isn't
>> >> clear
>> >> > what other capabilities we might add in the future so I'm unconvinced
>> >> that
>> >> > capabilities needs to exist at all. My fear is that we will create
the
>> >> > infrastructure for capabilities just to serve this need and nothing
>> >> else.
>> >> >
>> >> > I object to Solution 1 along the same lines. The Mesos API isn't
>> >> designed
>> >> > for packing arbitrary data in the status update message and I don't
>> >> think
>> >> > we should abuse that and rely on that. Also our current infrastructure
>> >> just
>> >> > plumbs the message to the UI and I think displaying capabilities is
>> not
>> >> > something we should do.
>> >> >
>> >> > I am in favor of Solution 3 which is as close as possible to the
>> >> original
>> >> > design in the design doc. The design doc says the following:
>> >> >
>> >> > Scheduler updater will skip the minWaitInInstanceMs (aka watch_secs
>> >> >> <https://github.com/apache/aurora/blob/4b43305b33cd8bebdd802
>> >> 25a3987b7cc7a8389a2/docs/configuration-reference.md#
>> updateconfig-objects
>> >> >)
>> >> >> grace period any time it detects a named port ‘health’ in task
>> >> >> configuration. A RUNNING instance status will signify the end of
>> >> instance
>> >> >> update.
>> >> >
>> >> >
>> >> > Instead of detecting the 'health' port in the task configuration, we
>> >> make
>> >> > enabling this feature explicitly by enabling a bit in the task
>> >> > configuration with the `executorDrivenUpdates` bit.
>> >> >
>> >> > I understand this option makes this feature more complex because it
>> >> > requires a schema change and requires operators to deploy the executor
>> >> to
>> >> > all agents before upgrading the client. However, I think that's a one
>> >> time
>> >> > operational cost as a opposed to long lived design choices that will
>> >> affect
>> >> > the code.
>> >> >
>> >> > Further Solution 3 is the most amenable to custom executors and
>> >> continues
>> >> > our tradition of treating executors as opaque black boxes. I think
>> >> there is
>> >> > a lot of value in treating executors as black boxes as it leaves the
>> >> door
>> >> > open to switching our executor to something else and doesn't impose
a
>> >> > burden to others that want to write their own.
>> >> >
>> >> > Alternatively, if amending the schema is too much work, I would be
>> open
>> >> to
>> >> > just saying that scheduler version 0.16 (or 0.17) just assumes the
>> >> executor
>> >> > transitions to RUNNING once a task is healthy and dropping
>> `watch_secs`
>> >> > entirely. We can put it in the release notes that operators must
>> deploy
>> >> the
>> >> > executor to 100% before deploying the scheduler.
>> >> >
>> >> >
>> >> > On Thu, Sep 1, 2016 at 6:40 PM, 黄 凯 <texasred2013@hotmail.com>
wrote:
>> >> >
>> >> >> Hi Folks,
>> >> >>
>> >> >> I'm currently working on a feature on aurora scheduler and executor.
>> >> The
>> >> >> implementation strategy became controversial on the review board,
so
>> I
>> >> was
>> >> >> wondering if I should broadcast it to more audience and initiate
a
>> >> >> discussion. Please feel free to let me know your thoughts, your
help
>> is
>> >> >> greatly appreciated!
>> >> >>
>> >> >> The high level goal of this feature is to improve reliability and
>> >> >> performance of the Aurora scheduler job updater, by relying on
health
>> >> check
>> >> >> status rather than watch_secs timeout when deciding an individual
>> >> instance
>> >> >> update state.
>> >> >>
>> >> >> Please see the original review request *
>> https://reviews.apache.org/r/
>> >> 51536/
>> >> >> <https://reviews.apache.org/r/51536/> *
>> >> >> aurora JIRA ticket *https://issues.apache.org/jira/browse/AURORA-894
>> >> >> <https://issues.apache.org/jira/browse/AURORA-894>*
>> >> >> design doc *https://docs.google.com/docum
>> >> ent/d/1ZdgW8S4xMhvKW7iQUX99xZm10NXSxEWR0a-21FP5d94/edit#
>> >> >> <https://docs.google.com/document/d/1ZdgW8S4xMhvKW7iQUX99xZm
>> >> 10NXSxEWR0a-21FP5d94/edit#>*
>> >> >> for more details and background.
>> >> >>
>> >> >> Note: The design doc becomes a little bit outdated on the "scheduler
>> >> >> change summary" part (this is what the review request trying to
>> >> address).
>> >> >> As a result, I've left some comment to clarify the latest proposed
>> >> >> implementation plan for scheduler change.
>> >> >>
>> >> >> There are two questions I'm trying to address here:
>> >> >> *1. How does the scheduler infer the executor version and be backward
>> >> >> compatible?*
>> >> >> *2. Where do we determine if health check is enabled?*
>> >> >>
>> >> >> In short, there are 3 different solutions proposed on the review
>> board.
>> >> >>
>> >> >> In the first two approaches, the scheduler will rely on a string
to
>> >> >> determine the executor version. We determine whether health check
is
>> >> >> enabled merely on executor side. There will be communication between
>> >> the
>> >> >> executor and the scheduler.
>> >> >> *Solution 1: *
>> >> >> *vCurrent executor sends a message in its health check thread during
>> >> >> RUNNING state transition, and the vCurrent updater will infer the
>> >> executor
>> >> >> version from the presence of this message, and skip the watch_secs
if
>> >> >> necessary.*
>> >> >>
>> >> >> *Solution 2:*
>> >> >> *Instead of relying on the presence of an arbitrary string in the
>> >> message,
>> >> >> rely on the presence of a string like:
>> >> >> "capabilities:CAPABILITY_1,CAPABILITY-2" where CAPABILITY_1 and
>> >> >> CAPABILITY_2 (etc.) are constants defined in api.thrift. Basically
>> just
>> >> >> formalizing the mechanism and making it a bit more future proof.*
>> >> >>
>> >> >> In the third solution, the scheduler infers the executor version
from
>> >> the
>> >> >> JobUpdateSettings on scheduler side.
>> >> >> *Solution 3:*
>> >> >> *Adding a bit to JobUpdateSettings which is ‘executorDrivenUpdates',
>> if
>> >> >> that is set, the scheduler assumes that the transition from STARTING
>> ->
>> >> >> RUNNING makes the executor healthy and concurrently, we release
>> >> thermos and
>> >> >> change HealthCheckConfig to say that it should only go to running
>> after
>> >> >> healthy*.
>> >> >>
>> >> >> *Pros and Cons:*
>> >> >> The main benefit of Solution 1 is:
>> >> >> 1. By using the message in task status update, we don't have to
make
>> >> any
>> >> >> schema change, which makes the design simple.
>> >> >> 2. The feature is fully backward-compatible. When we roll out the
>> >> vCurrent
>> >> >> schedulers and executors, we do not have to instruct the users
to
>> >> provide
>> >> >> additional field in the Job or Update configs, which could confuses
>> >> >> customers when the vPrev and vCurrent executor coexist in the
>> cluster.
>> >> >>
>> >> >> Concerns:
>> >> >> Relying on the presence of a message makes things brittle. Also
we do
>> >> not
>> >> >> want to expose this message to users.
>> >> >>
>> >> >> The benefit of Solution 2 is making the feature more future proof.
>> >> >> However, if we do not envision a new executor feature in the short
>> >> term,
>> >> >> it's not too much different from Solution 1.
>> >> >>
>> >> >> The benefits of Solution 3 include:
>> >> >> 1. We support more than just thermos now (and others rely on custom
>> >> >> executors).
>> >> >> 2. A lot of things in Aurora treat the executor as opaque. The
status
>> >> >> update message sent by executor should not be visible to users
only
>> if
>> >> it's
>> >> >> an error message.
>> >> >>
>> >> >> Concerns:
>> >> >> 1. In addition to the ‘executorDrivenUpdates' bit that identifies
the
>> >> >> executor version, we still need to notify the scheduler if health
>> >> check is
>> >> >> enabled on vCurrent executor, if not, the scheduler must be able
to
>> >> fall
>> >> >> back to use watch_secs.
>> >> >> 2. The users have to provide an additional field in their .aurora
>> >> config
>> >> >> files. The feature wouldn't be available unless new clients are
>> rolled
>> >> out
>> >> >> as well.
>> >> >>
>> >> >> Please let me know if I understand your suggestions correctly and
>> >> >> hopefully everyone is on the same page!
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Kai
>> >> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Zameer Manji
>> >
>>
>>
>>
>> --
>> Zameer Manji
>>

Mime
View raw message