groovy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thibault Kruse <tibokr...@googlemail.com>
Subject Re: usage of NumberAwareComparator in extension methods and number oddities (a bit code review)
Date Thu, 03 Sep 2015 13:07:56 GMT
That's a long read. In a nutshell, it seems NumberComparator does
satisfy not the required anti-commutativity (
"The implementor must ensure that <tt>sgn(compare(x, y)) ==
-sgn(compare(y, x))</tt> for all <tt>x</tt> and <tt>y</tt>.")
when
used for Non-Comparables:

x = new MyNumber(n: 1)
y = new Integer(0)
assert comp.compare(x, y) == - comp.compare(y, x)
       |    |       |  |  |  | |    |       |  |
       |    -1      |  0  |  1 |    -1      0  MyNumber(1)
       |            |     |
org.codehaus.groovy.runtime.NumberAwareComparator@3e77a1ed
       |            |     false
       |            MyNumber(1)
       org.codehaus.groovy.runtime.NumberAwareComparator@3e77a1ed


And it is not "consistent with equals", which means it should not be
used with sorted Sets or Maps ("should be used with caution").

assert (comp.compare(new Float(0), new Integer(0)) == 0) ==  (new
Float(0).equals(new Integer(0)))
        |    |       |             |               |     |    |
    |      |
        |    0       0.0           0               true  |    0.0
    false  0
        |                                                false
        org.codehaus.groovy.runtime.NumberAwareComparator@fe18270

Both violating anti-commutativity and not offering equals-consistency
make NumberAwareComparator an accident waiting to happen.

So +1 for making that less Groovy-relaxed and more Java-strict. But I
would say that about many things, without care for breaking existing
code (Because that's the only way to move Groovy into more production
code in the world).

The Collection.minus() problem looks too complex to understand at a glance.



On Thu, Sep 3, 2015 at 1:45 PM, Jochen Theodorou <blackdrag@gmx.org> wrote:
> Hi all,
>
> NumberAwareComparator is used in several extension methods Groovy provides
> to be able to compare things, that cannot really be compared. That involves
> sorting (sort, toSorted), but also things like: retainAll, removeAll, minus
> on Collection, intersect and disjoint. NumberAwareComperator will basically
> use the same logic as <=>, but in case it cannot be compared, it will
> compare using the hashcode, though never issuing an equals here. Of course
> this logic is difficult to understand and was mainly done to be able to sort
> things somehow for things you cannot really sort. Also note that <=> purely
> depends on compareTo, even throwing an exception if only equals would be
> needed.
>
> The first question I have here is: Do we really want to keep providing a
> very difficult to understand way of comparing things, you actually cannot
> compare for sorting? Meaning for example, should [new Object(), new
> Object()].sort() really give a random order, or should it result in an
> exception? I am not sure anymore, that we do our users a good thing
> providing this operation.
>
> And of course there are these other methods, that only need equality, but
> use NumberAwareComperator.
>
> The method with the longest history in this is most probably
> Collection#minus(Collection), well the List variant actually, but the
> Collection variant is where the implementation moved to.
>
> This method actually has two variants inside. First there will be a test for
> the sameType being used and then.... frankly after so many iterations the
> code looks just odd to me.
>
>>             //n*LOG(n) version
>>             Set<T> answer;
>>             if (Number.class.isInstance(head)) {
>>                 answer = new TreeSet<T>(numberComparator);
>>                 answer.addAll(self);
>>                 for (T t : self) {
>>                     if (Number.class.isInstance(t)) {
>>                         for (Object t2 : removeMe) {
>>                             if (Number.class.isInstance(t2)) {
>>                                 if (numberComparator.compare(t, (T)t2) ==
>> 0)
>>                                     answer.remove(t);
>>                             }
>>                         }
>>                     } else {
>>                         if (removeMe.contains(t))
>>                             answer.remove(t);
>>                     }
>>                 }
>>             } else {
>
>
> we get here after we have established, that all the elements are of the
> "same" type. Either directly the same, or null, or a Number.  So if the head
> is a number, then all elements should be a number. "answer" is a TreeSet
> with numberComparator (NumberAwareComparator). Now, since we use the number
> comperator to make a TreeSet, this will mean this action alone may already
> remove elements. But "answer" is not what the method will return, it is
> "ansCollection", which will be build separately based on answer later. The
> inner Number check is, if my assumption before is right, surplus and the
> else-branch there never visited. And then we iterate over all elements of
> self and all elements of removeMe to eventually call answer.remove... That
> smells like n*n*logn complexity. The statement about this being an n*logn
> would be wrong then.
>
> I really wonder if something like this:
>
> Set<T> answer;
> if (head instanceof Number) {
>   answer = new TreeSet<T>(numberComparator);
>   answer.addAll(self);
>   Set<T> removeSet = new TreeSet<T>(numberComperator);
>   removeSet.addAll(removeMe);
>   answer.removeAll(removeSet);
> }
>
> would not be better. At least that should go near n*logn. And we do
> something similiar in case head is no number:
>
>>             } else {
>>                 answer = new TreeSet<T>(numberComparator);
>>                 answer.addAll(self);
>>                 answer.removeAll(removeMe);
>>             }
>
>
> Only I am wondering.... if removeMe is a Set as well, then this may not do
> what we want. Imagine removeMe being another TreeSet with its own comparator
> and this implementation of removeAll:
>
>>     public boolean removeAll(Collection<?> c) {
>>         Objects.requireNonNull(c);
>>         boolean modified = false;
>>
>>         if (size() > c.size()) {
>>             for (Iterator<?> i = c.iterator(); i.hasNext(); )
>>                 modified |= remove(i.next());
>>         } else {
>>             for (Iterator<?> i = iterator(); i.hasNext(); ) {
>>                 if (c.contains(i.next())) {
>>                     i.remove();
>>                     modified = true;
>>                 }
>>             }
>>         }
>>         return modified;
>>     }
>
>
> This means c.contains will be called and the different comparator may be
> used for it. Proof:
>
>> def toRemove = new TreeSet({a,b->-1})
>> toRemove.addAll(["a","b"])
>> def res = ["a"]*2 - toRemove
>> assert res == ["a","a"]
>>
>> res = ["a"]*2+["b","c"] - toRemove
>> assert res == ["c"]
>
>
> Of course this Comparator does not behave nice, but the point is more that
> our comparator is not used in the first case. It depends on the size of the
> Set, or better said, the size of the TreeSet. Since equal (according to
> number comparator) elements appear there only once, the size in the first
> case will be 1, in the second case 3, while the sizes of the self
> collections are 2 and 4. Frankly I cannot really understand why the JDK has
> this in this way at all, or why TreeSet does not override it. Sure, the
> version they choose is better for performance, but I find this behaviour not
> right, considering how removeAll works on for example lists. Anyway... this
> should not be a rant about the JDK ;). Now... does it make sense to use
> NumberAwareComperator here at all? If we want to support [1.0]-[1]==[], then
> yes. But the gravity of this is unclear I would say:
>
>> class MyNumber extends Number {
>>     def n
>>     int intValue(){n}
>>     long longValue(){n}
>>     float floatValue(){n}
>>     double doubleValue(){n}
>>     int hashCode(){-n}
>>     boolean equals(other) {
>>         if (other instanceof MyNumber) { return n==other.n}
>>         return false
>>     }
>>     int compareTo(MyNumber other) {
>>         return n <=> other.n
>>     }
>>     String toString(){"MyNumber($n)"}
>> }
>>
>> def res = [1,new MyNumber(n:1)] - [1]
>
>
> while the first loops to produce answer will correctly produce a TreeSet
> containing only MyNumber, the later logic to produce ansCollection will
> cause problems and nothing will be removed. ThatÄs because MyNumber(n:1) is
> equal to 1, ehm, actually that is wrong. 1 is equal to MyNumber(n:1)
> according to Groovy logic, but not the other way around.
>
> assert 1 == new MyNumber(n:1)
> assert new MyNumber(n:1) != 1
>
> That is because we don't actually call Integer#compareTo. If that had been
> the case, then both would not be equal. No, instead we fall back to those
> number math classes, and they assume, that if you compare something with an
> integer, then the other one needs to be converted to an integer as well. And
> since my intValue() method here returns 1, they are seen as equal, even
> though compareTo, equals and even the hashcodes would not allow that. If I
> reverse the order in my list:
>
>> def res = [new MyNumber(n:1),1] - [1]
>
>
> then the result will be as expected and contains only MyNumber(n:1). Funny
> thing here is... I added a compareTo method, but no Comparable interface. So
> MyNumber is not comparable... Does it changes things if it does? So I add
> "implements Comparable<MyNumber>" to the class and now I get the same
> result, regardless of the order. Which is []. That is because now 1 is
> always equal to MyNumber(n:1). That is because now in both case IntegerMath
> is used. Does this mean any two custom numbers are equal, if their intValue
> is? So let's take MyNumber again and make a second class just the same,
> including Comparable, then call it MyNumber2
>
>> assert new MyNumber2(n:1) == new MyNumber(n:1)
>> assert new MyNumber(n:1) == new MyNumber2(n:1)
>
>
> Well... if you did read, what I did write before, then this is no
> surprise... But frankly... should it be like this? And if MyNumber2 does not
> implement Comparable:
>
>> assert new MyNumber2(n:1) != new MyNumber(n:1)
>> assert new MyNumber(n:1) == new MyNumber2(n:1)
>
>
> again because of the fallback logic to IntegerMath... and if both do not
> implement it:
>
>> assert new MyNumber2(n:1) != new MyNumber(n:1)
>> assert new MyNumber(n:1) != new MyNumber2(n:1)
>
>
> Well... that is more the expected result, still... and not to forget:This is
> ==, not <=>. == uses <=> only for comparable cases. So:
>
>> new MyNumber2(n:1) <=> new MyNumber(n:1)
>> new MyNumber(n:1) <=> new MyNumber2(n:1)
>
>
> will both throw a GroovyRuntimeException, if non does implement Comparable.
> If MyNumber2 implements it, the first compare works using IntegerMath and
> tells us they are equal. If both implement it, they are both equal. That is
> basically the same as ==, but what does NumberAwareComperator do for such
> cases?
>
>>         try {
>>             return DefaultTypeTransformation.compareTo(o1, o2);
>>         } catch (ClassCastException cce) {
>>             /* ignore */
>>         } catch (GroovyRuntimeException gre) {
>>             /* ignore */
>>         }
>>         // since the object does not have a valid compareTo method
>>         // we compare using the hashcodes. null cases are handled by
>>         // DefaultTypeTransformation.compareTo
>>         // This is not exactly a mathematical valid approach, since we
>> compare object
>>         // that cannot be compared. To avoid strange side effects we do a
>> pseudo order
>>         // using hashcodes, but without equality. Since then an x and y
>> with the same
>>         // hashcodes will behave different depending on if we compare x
>> with y or
>>         // x with y, the result might be unstable as well. Setting x and y
>> to equal
>>         // may mean the removal of x or y in a sorting operation, which we
>> don't want.
>>         int x1 = o1.hashCode();
>>         int x2 = o2.hashCode();
>>         if (x1 > x2) return 1;
>>         return -1;
>
>
> compareTo is the path for <=>, so we can expect exceptions here. So now with
> MyNumber being a Comparable, and MyNumber2 not:
>
>> println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber2(n:1)])
>> println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber(n:1)])
>> println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber2(n:1)])
>> println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber(n:1)])
>
>
> In the first two cases nothing is removed, in the third case we get an empty
> list and in the last case MyNumber is removed.
> Since MyNumber2 does not implement Comparable here
> DefaultTypeTransformation.compareTo will throw an exception, whenever we
> have MyNumber2 as o1. The compare using hashcodes will never equal. Well.
> There has been one part of that minus method we have not looked at. This is
> called if the first element is not Comparable, even if all extend Number:
>
>>             //n*n version
>>             List<T> tmpAnswer = new LinkedList<T>(self);
>>             for (Iterator<T> iter = tmpAnswer.iterator(); iter.hasNext();)
>> {
>>                 T element = iter.next();
>>                 boolean elementRemoved = false;
>>                 for (Iterator<?> iterator = removeMe.iterator();
>> iterator.hasNext() && !elementRemoved;) {
>>                     Object elt = iterator.next();
>>                     if (DefaultTypeTransformation.compareEqual(element,
>> elt)) {
>>                         iter.remove();
>>                         elementRemoved = true;
>>                     }
>>                 }
>>             }
>>
>>             //remove duplicates
>>             //can't use treeset since the base classes are different
>>             ansCollection.addAll(tmpAnswer);
>
>
> so if MyNumber2 is the head element this code will be executed instead,
> since MyNumber2 not implement Comparable.
> DefaultTypeTransformation.compareEqual is the code path taken for "==" which
> I talked about already. Remember, IntegerMath is the fallback for
> Numbers.compareEqual will prevent the exception being thrown, even though
> compareTo is called internally, but only if the left hand side is a
> Comparable. So here equality checks are done instead.. for example using the
> equals method. That explains why ([new MyNumber2(n:1), new MyNumber(n:1)] -
> [new MyNumber2(n:1)]) gets the empty list... first part is normaly equality,
> second part is fallback to IntegerMath. Of course if the code used
> DefaultTypeTransformation.compareEqual(elt, element), then this would behave
> different, but the result would be the same. The major difference would be
> that MyNumber2#equals is not called. For ([new MyNumber2(n:1), new
> MyNumber(n:1)] - [new MyNumber(n:1)]) we do compareEquals with MyNumber2 and
> MyNumber first, resulting in the element not being removed, so the result
> will be [MyNumber(n:1)]
>
> But that did not give me the exception path... Looking at the code, with
> numbers you won't get that path... or not? Well, with a subclass of
> BigDecimal/BigInteger you can do that. But they implement
> Comparable<BigDecimal>/Comparable<BigInteger>, so you are supposed to make
a
> method for that only... supposed to is not equal to people actually doing
> that. Still I won't give an example of that now.
>
> If you look at the sameType method, then it looks only at Number and null
> and the exact same class. to enter the not so n*logn path and use the number
> comperator. Point being here, the hashcode compares will not normally happen
> for Numbers or null. And in case of all elements of both Collections being
> of the same class, there should be no exception we have the right to catch
> and fall back to hashcode logic. So we could question here if
> NumberAwareComperator should be even used like it is for this method.
>
> So let's have a much shorter look at the other methods...
>
> retainAll, intersect and disjoint: uses the comparator for all elements, so
> it may use the hashcode logic. But at least they should be stable.
> removeAll: same. Well, there is in theory the issue of self being a Set and
> the special Set logic. But in this case, this method is not supposed to be
> called.
>
>
> Actually, there are two more methods using the Comperator behind the
> scenes.... coercedEquals (used by several equals methods, minus and another
> old friend: unique) and numberAwareCompareTo (used by coerecedEquals and
> ObjectRange).
>
> So we would have to look at those as well... but this mail got pretty long
> already and I am out of time. So I put to discussion what I wrote about for
> now.
>
> bye blackdrag
>
> --
> Jochen "blackdrag" Theodorou
> blog: http://blackdragsview.blogspot.com/
>

Mime
View raw message