avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-182) hashCode and equals are not consistent with compareTo
Date Tue, 10 Nov 2009 17:27:27 GMT

    [ https://issues.apache.org/jira/browse/AVRO-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775951#action_12775951
] 

Doug Cutting commented on AVRO-182:
-----------------------------------

> I think the equals() thing is a bug. 

It was intentional.  I think of it as a feature, but that's probably idiosyncratic.  The canonical
use of equals() is in HashMap.  With TreeMap, if you try to mix classes, you'll probably get
ClassCastException from the compareTo implementation.  But, with a HashMap, if you get a ClassCastException
from an equals() implementation it's a bug?

We could add 'if (!(that instanceof Record)) return false;' to the front of all of these (I
had it at one point and removed it).  It makes the code bigger and slower and may make it
harder to find uses of null pointers and incorrect types, but, on the other hand, it may permit
some uses that would not otherwise be permitted.  I lean towards smaller code that's finicky
about types to greater flexibility.  But if you still think it's a bug, I'll change it.

> Do you feel it beneficial to include the hashcode of the schema in the hashcode as well?

No.  It would make hashCode slower.  It would prevent false-positives for data that has the
same leaf values in the same order but in different types, but I think such cases are rare.
 Most Java hashCode implementations do not, e.g., include the hash of the class name, just
the instance data.

> it seems like extracting the logic for combining two hash codes into a function might
be nice.

That would be nice.  I'm leery of varrargs, since they'll allocate an array per call, and
hashCode should be wicked fast.  We might instead add a method like:

{code}
public int incrementHash(int hashCode, Object o, Schema s) {
  return 31*hashCode + hashCode(o, s)
}
{code}

Do you like that?  Do you think we should throw in an xor or a shift?

> You chose GenericArray to delegate to GenericData [ ... ]

I'm not sure what you mean.  GenericArray is the interface that GenericDatumReader and GenericDatumWriter
use.  So one can change one's array implementation, and, as long as one implements GenericArray,
one can use GenericDatumReader and GenericDatumWriter unchanged.  The standard implementation
of GenericArray is GenericData.Array.  But if one does not like the GenericArray interface,
then one can override methods in GenericDatumReader and GenericDatumWriter to use an arbitrary
class to represent arrays and use GenericArray at all.

GenericData has code that's shared by GenericDatumReader and GenericDatumWriter, or that's
otherwise common to generic data.  GenericData is a singleton, so that its methods may be
overidden by singleton subclasses SpecificData and ReflectData.  In this way, SpecificDatumReader
and SpecificDatumWriter can, e.g., share much of the GenericDatumReader and GenericDatumWriter
logic, with minimal code duplication.

So, back to your question, I don't see that GenericArray delegates to GenericData, but maybe
the explanations above help to better explain the structure of this stuff?

> hashCode and equals are not consistent with compareTo
> -----------------------------------------------------
>
>                 Key: AVRO-182
>                 URL: https://issues.apache.org/jira/browse/AVRO-182
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>            Reporter: Doug Cutting
>            Assignee: Doug Cutting
>             Fix For: 1.3.0
>
>         Attachments: AVRO-182.patch
>
>
> Java's specific and generic APIs implement compareTo according to the schema, where some
fields might be ignored.  To be consistent, fields that are ignored when comparing for ordering
should also be ignored when comparing for equality and for computing hashCodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message