aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Chu-Carroll" <mchucarr...@twopensource.com>
Subject Re: Review Request 22457: Improve aurora "job diff" command.
Date Wed, 18 Jun 2014 19:23:22 GMT


> On June 16, 2014, 5:45 p.m., Maxim Khutornenko wrote:
> > src/main/python/apache/aurora/client/cli/jobs.py, lines 181-182
> > <https://reviews.apache.org/r/22457/diff/3/?file=609592#file609592line181>
> >
> >     I don't think it's enough to json-serialize a thrift task. This is bound to
set/dict serialization sorting problem we have battled with in updater.py. Specifically this
block: https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/api/updater.py#L200-L223
> >     
> >     Without sorting, the diff will only work 99% of the time. This has been demonstrated
in a few real-life job updates.
> 
> Mark Chu-Carroll wrote:
>     This isn't a new thing in this updated diff system - this is the current behavior.
>     
>     If you use the new json-tree-diff, you won't have this problem; but for people who
rely on the current diff
>     behavior, I'd rather not change that.
>
> 
> Maxim Khutornenko wrote:
>     | If you use the new json-tree-diff, you won't have this problem; 
>     
>     Perhaps I am missing something but how would the new approach ensure equivalence
of these two structs?
>     { 'set1': { ['k1':'v1', 'k2':'v2']} }
>     { 'set2': { ['k2':'v2', 'k1':'v1']} }
>     
>     It's "normal" to see element re-ordering like this during thrift struct serialization.
> 
> Maxim Khutornenko wrote:
>     Here is a real TaskConfig json fragment and its reordered twin:
>     
>     1: u'constraints': [{u'name': u'value', u'constraint': {u'values': [u'1', u'2']}},
{u'name': u'limit', u'constraint': {u'limit': {u'limit': 10}}}]
>     2: u'constraints': [{u'constraint': {u'values': [u'2', u'1']}, u'name': u'value'},
{u'name': u'limit', u'constraint': {u'limit': {u'limit': 10}}}]
> 
> Mark Chu-Carroll wrote:
>     I don't think that the reordering that you show in your first example isn't valid
json. Key-value pairs belong in a dictionary, not a list.  In a dictionary, the comparison
routine does the right thing: it compares key by key, without regard to ordering.
>     
>     In the second example: I'd argue that that's actually a bug in our serializer: it's
using an ordered list for an unordered constraint. Diff can't tell that the list structure
isn't really supposed to be ordered; the serializer should be canonicalizing it.
>
> 
> Maxim Khutornenko wrote:
>     Not sure how else you could represent a python set in JSON without completely changing
the underlying data structure (e.g. emitting fake keys for every element and converting set
into a dictionary with sorted keys). Agree, the TJSONProtocol could be a bit smarter when
dealing with python sets but the reality is that python sets are inherently "unhashable" structs.
I have seen cases when completely repr-equivalent python sets were not comparing as equal.
> 
> Mark Chu-Carroll wrote:
>     All the serializer would need to do is just canonicalize when it converts to a list.
It's not an issue of being unhashable. But it should at least bother to be *uniform*. If you're
converting a set to a list, chose a convention for ordering elements. Put 'em in lexicographic
order, problem solved.
>     
>     As it stands, that problem is unsolvable for this diff tool: there's no way to tell
that a given field was supposed to be a set, and should be compared without ordering. 
>     
>     I'm not sure what you'd like me to do here, short of giving up and throwing away
the new diff tool. I don't see any way to solve this, short of creating something that uses
thrift introspection.
>
> 
> Maxim Khutornenko wrote:
>     We solved this problem in updater diff before (see the link above) and I am sure
it can be solved here. Converting to sorted tuples does not produce a pretty diff but if we
are targeting value-comparison diff output rather than serialized string compare the problem
shifts into pretty-fying the sorted tuple output rather than handling diff mechanics.

Not sure what you mean by "converting to sorted tuples". Is there any documentation of what
the "sorted tuple" format looks like? Or do I just need to reverse-engineer? (I hate doing
that, because if I make any wrong assumptions, things get ugly.)


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22457/#review45835
-----------------------------------------------------------


On June 18, 2014, 10:59 a.m., Mark Chu-Carroll wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22457/
> -----------------------------------------------------------
> 
> (Updated June 18, 2014, 10:59 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Brian Wickman.
> 
> 
> Bugs: aurora-520
>     https://issues.apache.org/jira/browse/aurora-520
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Add a new diff method, which uses field-by-field comparison of JSON trees for comparing
running job configurations to potentially updated configs.
> 
> - Allow exclusion of semantically irrelevant fields.
> - Provide a clearer list of the differences between configs.
> - Provide a scripting-friendly alternative JSON syntax for diffs.
> 
> The old diff behavior is still available under the "--use-shell-diff" option.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/client/cli/BUILD ebe681a0d1735b7cc695dc3b7a14c4292d87ae32

>   src/main/python/apache/aurora/client/cli/jobs.py 4fa03a6c9919651551238b0dc211ed69a8dfe565

>   src/main/python/apache/aurora/client/cli/json_tree_diff.py PRE-CREATION 
>   src/test/python/apache/aurora/client/cli/BUILD 3c88ed7cf9f654bbbd80d1d44aa1dd1c8655e378

>   src/test/python/apache/aurora/client/cli/test_diff.py 38629b63c082cf81cb891dace2a70d9e8f418e18

>   src/test/python/apache/aurora/client/cli/test_json_diff.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22457/diff/
> 
> 
> Testing
> -------
> 
> New unit tests of the JSON tree diff code, plus a bunch of new "job diff" tests of the
new functionality.
> All tests pass.
> 
> 
> Thanks,
> 
> Mark Chu-Carroll
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message