hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Implementing compareTo in user-written keys where one extends the other is error prone
Date Thu, 30 Apr 2009 21:31:29 GMT
Hi.  It took me a good part of a day to figure out what was going wrong,
so I'm sharing this in hopes of learning something from the community or
getting hadoop improved to avoid this kind of error for future users.

I have 2 key classes, one holds a String, the other one extends that,
and adds a boolean.

I implemented the first key class (let's call it Super)

public class Super implements WritableComparable<Super> {
 . . .
  public int compareTo(Super o) {
    // sort on string value
    . . .
  }

I implemented the 2nd key class (let's call it Sub)

public class Sub extends Super {
 . . .
  public int compareTo(Sub o) {
    // sort on boolean value
    . . .
    // if equal, use the super:
    ... else
     return super.compareTo(o);
  }


With this setup, I used the "Sub" class as a mapper output key, and
expected the sort on the boolean value to happen first, then for equal
values there, the sort on the string values.

What actually happened, was that the sort on the boolean value was
skipped completely, and only the sort on the string was done.

The reason for this is that (in 0.19.1 release) the WritableCompator
instance that is created (using the defaults - no custom Comparator)
knows the class is "Sub", and calls from the key value it created, and
calls the compareTo method, passing it the other key.  Both of these
keys are of type Sub.  However, they are passed via this code in
WritableComparator:

 public int compare(WritableComparable a, WritableComparable b) {
    return a.compareTo(b);
  }

Java uses the interface spec for WritableComparable that was declared,
in this case WritableComparable<Super>, and infers that the arg type for
the compareTo is Super.  So it "skips" calling the compareTo in Sub, and
just calls the one in Super.

The workaround is to change the signature of Sub's compareTo method to
match the spec in the interface, namely it has to take the Super as an
argument, and then cast it to Sub.

This seems like a very error prone design.  Am I doing something wrong,
or can this be improved so that this kind of error is avoided?

-Marshall Schor

Mime
View raw message