groovy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jochen Theodorou <blackd...@gmx.org>
Subject usage of NumberAwareComparator in extension methods and number oddities (a bit code review)
Date Thu, 03 Sep 2015 11:45:29 GMT
Hi all,

NumberAwareComparator is used in several extension methods Groovy 
provides to be able to compare things, that cannot really be compared. 
That involves sorting (sort, toSorted), but also things like: retainAll, 
removeAll, minus on Collection, intersect and disjoint. 
NumberAwareComperator will basically use the same logic as <=>, but in 
case it cannot be compared, it will compare using the hashcode, though 
never issuing an equals here. Of course this logic is difficult to 
understand and was mainly done to be able to sort things somehow for 
things you cannot really sort. Also note that <=> purely depends on 
compareTo, even throwing an exception if only equals would be needed.

The first question I have here is: Do we really want to keep providing a 
very difficult to understand way of comparing things, you actually 
cannot compare for sorting? Meaning for example, should [new Object(), 
new Object()].sort() really give a random order, or should it result in 
an exception? I am not sure anymore, that we do our users a good thing 
providing this operation.

And of course there are these other methods, that only need equality, 
but use NumberAwareComperator.

The method with the longest history in this is most probably 
Collection#minus(Collection), well the List variant actually, but the 
Collection variant is where the implementation moved to.

This method actually has two variants inside. First there will be a test 
for the sameType being used and then.... frankly after so many 
iterations the code looks just odd to me.

>             //n*LOG(n) version
>             Set<T> answer;
>             if (Number.class.isInstance(head)) {
>                 answer = new TreeSet<T>(numberComparator);
>                 answer.addAll(self);
>                 for (T t : self) {
>                     if (Number.class.isInstance(t)) {
>                         for (Object t2 : removeMe) {
>                             if (Number.class.isInstance(t2)) {
>                                 if (numberComparator.compare(t, (T)t2) == 0)
>                                     answer.remove(t);
>                             }
>                         }
>                     } else {
>                         if (removeMe.contains(t))
>                             answer.remove(t);
>                     }
>                 }
>             } else {

we get here after we have established, that all the elements are of the 
"same" type. Either directly the same, or null, or a Number.  So if the 
head is a number, then all elements should be a number. "answer" is a 
TreeSet with numberComparator (NumberAwareComparator). Now, since we use 
the number comperator to make a TreeSet, this will mean this action 
alone may already remove elements. But "answer" is not what the method 
will return, it is "ansCollection", which will be build separately based 
on answer later. The inner Number check is, if my assumption before is 
right, surplus and the else-branch there never visited. And then we 
iterate over all elements of self and all elements of removeMe to 
eventually call answer.remove... That smells like n*n*logn complexity. 
The statement about this being an n*logn would be wrong then.

I really wonder if something like this:

Set<T> answer;
if (head instanceof Number) {
   answer = new TreeSet<T>(numberComparator);
   answer.addAll(self);
   Set<T> removeSet = new TreeSet<T>(numberComperator);
   removeSet.addAll(removeMe);
   answer.removeAll(removeSet);
}

would not be better. At least that should go near n*logn. And we do 
something similiar in case head is no number:

>             } else {
>                 answer = new TreeSet<T>(numberComparator);
>                 answer.addAll(self);
>                 answer.removeAll(removeMe);
>             }

Only I am wondering.... if removeMe is a Set as well, then this may not 
do what we want. Imagine removeMe being another TreeSet with its own 
comparator and this implementation of removeAll:

>     public boolean removeAll(Collection<?> c) {
>         Objects.requireNonNull(c);
>         boolean modified = false;
>
>         if (size() > c.size()) {
>             for (Iterator<?> i = c.iterator(); i.hasNext(); )
>                 modified |= remove(i.next());
>         } else {
>             for (Iterator<?> i = iterator(); i.hasNext(); ) {
>                 if (c.contains(i.next())) {
>                     i.remove();
>                     modified = true;
>                 }
>             }
>         }
>         return modified;
>     }

This means c.contains will be called and the different comparator may be 
used for it. Proof:

> def toRemove = new TreeSet({a,b->-1})
> toRemove.addAll(["a","b"])
> def res = ["a"]*2 - toRemove
> assert res == ["a","a"]
>
> res = ["a"]*2+["b","c"] - toRemove
> assert res == ["c"]

Of course this Comparator does not behave nice, but the point is more 
that our comparator is not used in the first case. It depends on the 
size of the Set, or better said, the size of the TreeSet. Since equal 
(according to number comparator) elements appear there only once, the 
size in the first case will be 1, in the second case 3, while the sizes 
of the self collections are 2 and 4. Frankly I cannot really understand 
why the JDK has this in this way at all, or why TreeSet does not 
override it. Sure, the version they choose is better for performance, 
but I find this behaviour not right, considering how removeAll works on 
for example lists. Anyway... this should not be a rant about the JDK ;). 
Now... does it make sense to use NumberAwareComperator here at all? If 
we want to support [1.0]-[1]==[], then yes. But the gravity of this is 
unclear I would say:

> class MyNumber extends Number {
>     def n
>     int intValue(){n}
>     long longValue(){n}
>     float floatValue(){n}
>     double doubleValue(){n}
>     int hashCode(){-n}
>     boolean equals(other) {
>         if (other instanceof MyNumber) { return n==other.n}
>         return false
>     }
>     int compareTo(MyNumber other) {
>         return n <=> other.n
>     }
>     String toString(){"MyNumber($n)"}
> }
>
> def res = [1,new MyNumber(n:1)] - [1]

while the first loops to produce answer will correctly produce a TreeSet 
containing only MyNumber, the later logic to produce ansCollection will 
cause problems and nothing will be removed. ThatÄs because MyNumber(n:1) 
is equal to 1, ehm, actually that is wrong. 1 is equal to MyNumber(n:1) 
according to Groovy logic, but not the other way around.

assert 1 == new MyNumber(n:1)
assert new MyNumber(n:1) != 1

That is because we don't actually call Integer#compareTo. If that had 
been the case, then both would not be equal. No, instead we fall back to 
those number math classes, and they assume, that if you compare 
something with an integer, then the other one needs to be converted to 
an integer as well. And since my intValue() method here returns 1, they 
are seen as equal, even though compareTo, equals and even the hashcodes 
would not allow that. If I reverse the order in my list:

> def res = [new MyNumber(n:1),1] - [1]

then the result will be as expected and contains only MyNumber(n:1). 
Funny thing here is... I added a compareTo method, but no Comparable 
interface. So MyNumber is not comparable... Does it changes things if it 
does? So I add "implements Comparable<MyNumber>" to the class and now I 
get the same result, regardless of the order. Which is []. That is 
because now 1 is always equal to MyNumber(n:1). That is because now in 
both case IntegerMath is used. Does this mean any two custom numbers are 
equal, if their intValue is? So let's take MyNumber again and make a 
second class just the same, including Comparable, then call it MyNumber2

> assert new MyNumber2(n:1) == new MyNumber(n:1)
> assert new MyNumber(n:1) == new MyNumber2(n:1)

Well... if you did read, what I did write before, then this is no 
surprise... But frankly... should it be like this? And if MyNumber2 does 
not implement Comparable:

> assert new MyNumber2(n:1) != new MyNumber(n:1)
> assert new MyNumber(n:1) == new MyNumber2(n:1)

again because of the fallback logic to IntegerMath... and if both do not 
implement it:

> assert new MyNumber2(n:1) != new MyNumber(n:1)
> assert new MyNumber(n:1) != new MyNumber2(n:1)

Well... that is more the expected result, still... and not to 
forget:This is ==, not <=>. == uses <=> only for comparable cases. So:

> new MyNumber2(n:1) <=> new MyNumber(n:1)
> new MyNumber(n:1) <=> new MyNumber2(n:1)

will both throw a GroovyRuntimeException, if non does implement 
Comparable. If MyNumber2 implements it, the first compare works using 
IntegerMath and tells us they are equal. If both implement it, they are 
both equal. That is basically the same as ==, but what does 
NumberAwareComperator do for such cases?

>         try {
>             return DefaultTypeTransformation.compareTo(o1, o2);
>         } catch (ClassCastException cce) {
>             /* ignore */
>         } catch (GroovyRuntimeException gre) {
>             /* ignore */
>         }
>         // since the object does not have a valid compareTo method
>         // we compare using the hashcodes. null cases are handled by
>         // DefaultTypeTransformation.compareTo
>         // This is not exactly a mathematical valid approach, since we compare object
>         // that cannot be compared. To avoid strange side effects we do a pseudo order
>         // using hashcodes, but without equality. Since then an x and y with the same
>         // hashcodes will behave different depending on if we compare x with y or
>         // x with y, the result might be unstable as well. Setting x and y to equal
>         // may mean the removal of x or y in a sorting operation, which we don't want.
>         int x1 = o1.hashCode();
>         int x2 = o2.hashCode();
>         if (x1 > x2) return 1;
>         return -1;

compareTo is the path for <=>, so we can expect exceptions here. So now 
with MyNumber being a Comparable, and MyNumber2 not:

> println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber2(n:1)])
> println ([new MyNumber(n:1), new MyNumber2(n:1)] - [new MyNumber(n:1)])
> println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber2(n:1)])
> println ([new MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber(n:1)])

In the first two cases nothing is removed, in the third case we get an 
empty list and in the last case MyNumber is removed.
Since MyNumber2 does not implement Comparable here 
DefaultTypeTransformation.compareTo will throw an exception, whenever we 
have MyNumber2 as o1. The compare using hashcodes will never equal. 
Well. There has been one part of that minus method we have not looked 
at. This is called if the first element is not Comparable, even if all 
extend Number:

>             //n*n version
>             List<T> tmpAnswer = new LinkedList<T>(self);
>             for (Iterator<T> iter = tmpAnswer.iterator(); iter.hasNext();) {
>                 T element = iter.next();
>                 boolean elementRemoved = false;
>                 for (Iterator<?> iterator = removeMe.iterator(); iterator.hasNext()
&& !elementRemoved;) {
>                     Object elt = iterator.next();
>                     if (DefaultTypeTransformation.compareEqual(element, elt)) {
>                         iter.remove();
>                         elementRemoved = true;
>                     }
>                 }
>             }
>
>             //remove duplicates
>             //can't use treeset since the base classes are different
>             ansCollection.addAll(tmpAnswer);

so if MyNumber2 is the head element this code will be executed instead, 
since MyNumber2 not implement Comparable. 
DefaultTypeTransformation.compareEqual is the code path taken for "==" 
which I talked about already. Remember, IntegerMath is the fallback for 
Numbers.compareEqual will prevent the exception being thrown, even 
though compareTo is called internally, but only if the left hand side is 
a Comparable. So here equality checks are done instead.. for example 
using the equals method. That explains why ([new MyNumber2(n:1), new 
MyNumber(n:1)] - [new MyNumber2(n:1)]) gets the empty list... first part 
is normaly equality, second part is fallback to IntegerMath. Of course 
if the code used DefaultTypeTransformation.compareEqual(elt, element), 
then this would behave different, but the result would be the same. The 
major difference would be that MyNumber2#equals is not called. For ([new 
MyNumber2(n:1), new MyNumber(n:1)] - [new MyNumber(n:1)]) we do 
compareEquals with MyNumber2 and MyNumber first, resulting in the 
element not being removed, so the result will be [MyNumber(n:1)]

But that did not give me the exception path... Looking at the code, with 
numbers you won't get that path... or not? Well, with a subclass of 
BigDecimal/BigInteger you can do that. But they implement 
Comparable<BigDecimal>/Comparable<BigInteger>, so you are supposed to 
make a method for that only... supposed to is not equal to people 
actually doing that. Still I won't give an example of that now.

If you look at the sameType method, then it looks only at Number and 
null and the exact same class. to enter the not so n*logn path and use 
the number comperator. Point being here, the hashcode compares will not 
normally happen for Numbers or null. And in case of all elements of both 
Collections being of the same class, there should be no exception we 
have the right to catch and fall back to hashcode logic. So we could 
question here if NumberAwareComperator should be even used like it is 
for this method.

So let's have a much shorter look at the other methods...

retainAll, intersect and disjoint: uses the comparator for all elements, 
so it may use the hashcode logic. But at least they should be stable.
removeAll: same. Well, there is in theory the issue of self being a Set 
and the special Set logic. But in this case, this method is not supposed 
to be called.


Actually, there are two more methods using the Comperator behind the 
scenes.... coercedEquals (used by several equals methods, minus and 
another old friend: unique) and numberAwareCompareTo (used by 
coerecedEquals and ObjectRange).

So we would have to look at those as well... but this mail got pretty 
long already and I am out of time. So I put to discussion what I wrote 
about for now.

bye blackdrag

-- 
Jochen "blackdrag" Theodorou
blog: http://blackdragsview.blogspot.com/


Mime
View raw message