aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Khutornenko <ma...@apache.org>
Subject Re: JobConfig diff API
Date Wed, 16 Sep 2015 17:22:25 GMT
The benefit of assuming a certain format is the richer experience we
can give to our users. The *blob* may be too large to make any sense
of it during diffing. I don't propose to enforce any schema though,
that would be too restrictive. I do however believe assuming JSON
format would be an acceptable tradeoff.

Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
and disabling JSON diffs for any executor types that don't follow the
guidance. This will result in a degraded user experience but may be
the middle ground here. Thoughts?

On Tue, Sep 15, 2015 at 11:43 AM,  <meghdoot_b@yahoo.com.invalid> wrote:
> Isn't this data supposed to be any blob that transparently passes in to the executor
through mesos data blob. Why would we want to impose any sort of format? It could be a binary
blob. Executor writers should be able to move between different schedulers/frameworks without
any rework ideally. This field seems like more like garbage in and garbage out and only understood
by end client and the executor. Scheduler may stay out of it.
> If you compute hash and indicate same or different data between 2 job update diff, that
may be reasonable.
>
> My 2 cents.
>
> Thx
>
> Sent from my iPhone
>
>> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zmanji@apache.org> wrote:
>>
>> I'm a proponent of firming up our executor <-> scheduler contract. Since we
>> are going to get multiple executor support soon I think it would be nice if
>> we said that ExecutorConfig.data was JSON.
>>
>> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <maxim@apache.org>
>> wrote:
>>
>>> | I hope this doesn't mean we would be returning a textual
>>> representation of a diff
>>>
>>> If we can make an assumption that executor data is always JSON, we can
>>> deliver a much more specific answer by applying JSON diff tools.
>>> Something like:
>>>
>>> - "environment": "prod"
>>> + "environment": "test"
>>>
>>> Otherwise, we would have to output the entire ExecutorConfig.data blob
>>> content for both left and right sides and let users figure out the
>>> problem. I don't think that's acceptable.
>>>
>>> Does it make sense? Any suggestions on the output format of the diff?
>>> I think it should be structured but at the same time we have to get
>>> down to text level at some point to report concrete discrepancies.
>>>
>>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wfarner@apache.org> wrote:
>>>> The 'blob'-iness of ExecutorConfig is intentional so that we can support
>>>> alternative executors.  I'd hate for that to go away.
>>>>
>>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jfarrell@apache.org>
>>>> wrote:
>>>>
>>>>> This is one of the hoops encountered when using the Thrift api directly
>>> and
>>>>> not using the client, I'd love to see ExecutorConfig.data move to a
>>> thrift
>>>>> object and not be a string blob
>>>>>
>>>>> -Jake
>>>>>
>>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wfarner@apache.org>
>>> wrote:
>>>>>
>>>>>> I like the idea of adding this API, but i don't see why it requires
>>> us to
>>>>>> make assumptions about ExecutorConfig.data.  I hope this doesn't
mean
>>> we
>>>>>> would be returning a textual representation of a diff.  Can you
>>> elaborate
>>>>>> on that?
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <maxim@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Problem:
>>>>>>> We currently don't have a good user experience around "aurora
job
>>>>>>> diff" command. The task configs are dumped as "prettified" JSON
>>>>>>> strings and diffed with the system diff tool. Anyone who tried
to
>>> use
>>>>>>> it knows it can be very hard to read especially when it comes
to
>>>>>>> executor data deltas. Also, the implementation is done completely
>>>>>>> within the Aurora client making it hard to reuse this feature
by
>>> other
>>>>>>> clients (e.g.: an external deploy coordination tool).
>>>>>>>
>>>>>>> Proposal:
>>>>>>> Move the diff logic to the scheduler and expose it via a new
>>>>>>> jobConfigDiff thrift API.
>>>>>>>
>>>>>>> Benefits:
>>>>>>> - Client will no longer have the custom non-reusable logic moving
us
>>>>>>> closer towards a "thin client" goal.
>>>>>>> - The new RPC can be fully used by any existing or new API clients.
>>>>>>> - The diff output will be improved via leveraging third party
POJO
>>>>>>> and/or JSON diff libraries [1,2,3, etc.].
>>>>>>> - The server updater can be partially/fully unified with the
new
>>> diff
>>>>>>> logic further improving the overall DRY-ness.
>>>>>>>
>>>>>>> Concerns:
>>>>>>> - The executor data is currently treated as an opaque string
blob on
>>>>>>> the scheduler side. In reality, it's almost guaranteed to be
JSON.
>>> In
>>>>>>> order to deliver the best UX, the scheduler would have to start
>>>>>>> requiring ExecutorConfig.data to be a valid JSON.
>>>>>>>
>>>>>>> Any other concerns/objections/comments? I would like to formalize
>>> the
>>>>>>> proposal be EOW if we reach consensus quickly.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maxim
>>>>>>>
>>>>>>> [1] -
>>> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
>>>>>>> [2] - http://javers.org/documentation/diff-examples/
>>>>>>> [3] - https://github.com/skyscreamer/JSONassert
>>>
>>> --
>>> Zameer Manji
>>>
>>>

Mime
View raw message