aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: JobConfig diff API
Date Fri, 02 Oct 2015 20:47:41 GMT
This matches what I was nudging towards.  I think it offers what we want
without taking on a large technical challenge and placing restrictions on
the executor data blob.

On Friday, October 2, 2015, Maxim Khutornenko <maxim@apache.org> wrote:

> Thanks David. I agree pivoting this proposal around job update driven
> instructions would give us more incremental value for much less
> effort. I have researched and evaluated various Java and JSON diff
> libraries available under Apache license and could not find a
> reasonable solution that would give us exactly what we need. I started
> thinking about building a custom JSON diff java tool, which is
> certainly beyond the effort I hoped it would take.
>
> With something like 'getJobUpdateInstructions(JobUpdateRequest r)' API
> we could immediately improve the client diff story by avoiding
> duplicate diff results. We would also be able to have something like
> '--dry-run' option on a 'aurora update start' command listing the
> affected instances and actions about to be taken.
>
> As for the actual TaskConfig diff details, this seems like a natural
> improvement we could add later when/if we find a better JSON diffing
> tool or dare to implement our own.
>
> Any thoughts/objections to the above?
>
> On Fri, Oct 2, 2015 at 1:29 PM, David McLaughlin <david@dmclaughlin.com
> <javascript:;>> wrote:
> > I'd like to propose an alternative - that we start off by having an API
> > endpoint which simply returns the JobUpdateInstructions that describes
> the
> > changes that would happen if a given JobUpdateRequest was applied.
> >
> > There is a lot of value in having clients ask the scheduler to tell them
> > what is going to happen rather than try and duplicate that diff algorithm
> > in both places. Of course this does not solve the "pretty" diff command
> > when trying to *display* the differences, but it does remove a lot of the
> > work in implementing the diff command and I think for the different types
> > of clients (UI/CLI/etc.) displaying the diffs is going to be a much
> bigger
> > problem anyway.
> >
> > In the future if the scheduler does calculate and return some
> > representation of the diff, I'd argue that it should be stored in
> > JobUpdateInstructions anyway, so we can always add that functionality
> later
> > and keep the APIs the same.
> >
> > Thanks,
> > David
> >
> >
> > On Wed, Sep 16, 2015 at 10:23 AM, Bill Farner <wfarner@apache.org
> <javascript:;>> wrote:
> >
> >> I think it's fine to provide an 'enhanced' experience when the format is
> >> JSON, but i don't think we should force that.
> >>
> >> On Wed, Sep 16, 2015 at 10:22 AM, Maxim Khutornenko <maxim@apache.org
> <javascript:;>>
> >> wrote:
> >>
> >> > The benefit of assuming a certain format is the richer experience we
> >> > can give to our users. The *blob* may be too large to make any sense
> >> > of it during diffing. I don't propose to enforce any schema though,
> >> > that would be too restrictive. I do however believe assuming JSON
> >> > format would be an acceptable tradeoff.
> >> >
> >> > Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
> >> > and disabling JSON diffs for any executor types that don't follow the
> >> > guidance. This will result in a degraded user experience but may be
> >> > the middle ground here. Thoughts?
> >> >
> >> > On Tue, Sep 15, 2015 at 11:43 AM,  <meghdoot_b@yahoo.com.invalid>
> wrote:
> >> > > Isn't this data supposed to be any blob that transparently passes
> in to
> >> > the executor through mesos data blob. Why would we want to impose any
> >> sort
> >> > of format? It could be a binary blob. Executor writers should be able
> to
> >> > move between different schedulers/frameworks without any rework
> ideally.
> >> > This field seems like more like garbage in and garbage out and only
> >> > understood by end client and the executor. Scheduler may stay out of
> it.
> >> > > If you compute hash and indicate same or different data between 2
> job
> >> > update diff, that may be reasonable.
> >> > >
> >> > > My 2 cents.
> >> > >
> >> > > Thx
> >> > >
> >> > > Sent from my iPhone
> >> > >
> >> > >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zmanji@apache.org
> <javascript:;>> wrote:
> >> > >>
> >> > >> I'm a proponent of firming up our executor <-> scheduler
contract.
> >> > Since we
> >> > >> are going to get multiple executor support soon I think it would
be
> >> > nice if
> >> > >> we said that ExecutorConfig.data was JSON.
> >> > >>
> >> > >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <
> maxim@apache.org <javascript:;>
> >> >
> >> > >> wrote:
> >> > >>
> >> > >>> | I hope this doesn't mean we would be returning a textual
> >> > >>> representation of a diff
> >> > >>>
> >> > >>> If we can make an assumption that executor data is always
JSON, we
> >> can
> >> > >>> deliver a much more specific answer by applying JSON diff
tools.
> >> > >>> Something like:
> >> > >>>
> >> > >>> - "environment": "prod"
> >> > >>> + "environment": "test"
> >> > >>>
> >> > >>> Otherwise, we would have to output the entire ExecutorConfig.data
> >> blob
> >> > >>> content for both left and right sides and let users figure
out the
> >> > >>> problem. I don't think that's acceptable.
> >> > >>>
> >> > >>> Does it make sense? Any suggestions on the output format of
the
> diff?
> >> > >>> I think it should be structured but at the same time we have
to
> get
> >> > >>> down to text level at some point to report concrete discrepancies.
> >> > >>>
> >> > >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wfarner@apache.org
> <javascript:;>>
> >> > wrote:
> >> > >>>> The 'blob'-iness of ExecutorConfig is intentional so that
we can
> >> > support
> >> > >>>> alternative executors.  I'd hate for that to go away.
> >> > >>>>
> >> > >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <
> jfarrell@apache.org <javascript:;>
> >> >
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>> This is one of the hoops encountered when using the
Thrift api
> >> > directly
> >> > >>> and
> >> > >>>>> not using the client, I'd love to see ExecutorConfig.data
move
> to a
> >> > >>> thrift
> >> > >>>>> object and not be a string blob
> >> > >>>>>
> >> > >>>>> -Jake
> >> > >>>>>
> >> > >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <
> wfarner@apache.org <javascript:;>>
> >> > >>> wrote:
> >> > >>>>>
> >> > >>>>>> I like the idea of adding this API, but i don't
see why it
> >> requires
> >> > >>> us to
> >> > >>>>>> make assumptions about ExecutorConfig.data.  I
hope this
> doesn't
> >> > mean
> >> > >>> we
> >> > >>>>>> would be returning a textual representation of
a diff.  Can you
> >> > >>> elaborate
> >> > >>>>>> on that?
> >> > >>>>>>
> >> > >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko
<
> >> > maxim@apache.org <javascript:;>>
> >> > >>>>>> wrote:
> >> > >>>>>>
> >> > >>>>>>> Problem:
> >> > >>>>>>> We currently don't have a good user experience
around "aurora
> job
> >> > >>>>>>> diff" command. The task configs are dumped
as "prettified"
> JSON
> >> > >>>>>>> strings and diffed with the system diff tool.
Anyone who
> tried to
> >> > >>> use
> >> > >>>>>>> it knows it can be very hard to read especially
when it comes
> to
> >> > >>>>>>> executor data deltas. Also, the implementation
is done
> completely
> >> > >>>>>>> within the Aurora client making it hard to
reuse this feature
> by
> >> > >>> other
> >> > >>>>>>> clients (e.g.: an external deploy coordination
tool).
> >> > >>>>>>>
> >> > >>>>>>> Proposal:
> >> > >>>>>>> Move the diff logic to the scheduler and expose
it via a new
> >> > >>>>>>> jobConfigDiff thrift API.
> >> > >>>>>>>
> >> > >>>>>>> Benefits:
> >> > >>>>>>> - Client will no longer have the custom non-reusable
logic
> moving
> >> > us
> >> > >>>>>>> closer towards a "thin client" goal.
> >> > >>>>>>> - The new RPC can be fully used by any existing
or new API
> >> clients.
> >> > >>>>>>> - The diff output will be improved via leveraging
third party
> >> POJO
> >> > >>>>>>> and/or JSON diff libraries [1,2,3, etc.].
> >> > >>>>>>> - The server updater can be partially/fully
unified with the
> new
> >> > >>> diff
> >> > >>>>>>> logic further improving the overall DRY-ness.
> >> > >>>>>>>
> >> > >>>>>>> Concerns:
> >> > >>>>>>> - The executor data is currently treated as
an opaque string
> blob
> >> > on
> >> > >>>>>>> the scheduler side. In reality, it's almost
guaranteed to be
> >> JSON.
> >> > >>> In
> >> > >>>>>>> order to deliver the best UX, the scheduler
would have to
> start
> >> > >>>>>>> requiring ExecutorConfig.data to be a valid
JSON.
> >> > >>>>>>>
> >> > >>>>>>> Any other concerns/objections/comments? I
would like to
> formalize
> >> > >>> the
> >> > >>>>>>> proposal be EOW if we reach consensus quickly.
> >> > >>>>>>>
> >> > >>>>>>> Thanks,
> >> > >>>>>>> Maxim
> >> > >>>>>>>
> >> > >>>>>>> [1] -
> >> > >>>
> >> >
> >>
> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
> >> > >>>>>>> [2] - http://javers.org/documentation/diff-examples/
> >> > >>>>>>> [3] - https://github.com/skyscreamer/JSONassert
> >> > >>>
> >> > >>> --
> >> > >>> Zameer Manji
> >> > >>>
> >> > >>>
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message