avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Douglas Kaminsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-853) Cache hash codes in Schema and Field
Date Thu, 14 Jul 2011 03:33:59 GMT

    [ https://issues.apache.org/jira/browse/AVRO-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065028#comment-13065028
] 

Douglas Kaminsky commented on AVRO-853:
---------------------------------------

The purpose of a "quacksLike" method is to determine if two schemas are structurally equal
- in the particular example where we encountered the original slowdown, we were defining a
custom method for serializing schemas, where we performed certain optimizations if we had
encountered the schema before. In our example, we don't care if we encounter the exact same
schema elsewhere in our protocol, nor if the properties or aliases have been modified. For
our purposes, structurally equivalent schemas are the same schema...

Take for example the following schema with corresponding fields (in non-JSON to save typing):

{code}
{
 "name" : "A",
 "type" : "record",
 "fields" : [{"name" : "foo", "type" : "int"},
             {"name" : "bar", "type" : "long"}]
}
{code}

Now let's say that at some point another thread (for the purpose of argument) modifies the
properties of this schema:

{code}
{
 "name" : "A",
 "type" : "record",
 "fields" : [{"name" : "foo", "type" : "int"},
             {"name" : "bar", "type" : "long"}]
 "java-type-hint" : "some.type.Here"
}
{code}


A.equals(B) == false
A.quacksLike(B) == true

I almost want to say it's about congruence, but a true congruence predicate would probably
ignore naming, too.

> Cache hash codes in Schema and Field
> ------------------------------------
>
>                 Key: AVRO-853
>                 URL: https://issues.apache.org/jira/browse/AVRO-853
>             Project: Avro
>          Issue Type: Improvement
>          Components: java
>    Affects Versions: 1.5.1
>            Reporter: Douglas Kaminsky
>         Attachments: AVRO-853-approach2.patch, AVRO-853.patch
>
>
> We are experiencing a serious performance degradation when trying to store/retrieve fields
and schemas in hash-based data structures (eg. HashMap). Since all fields and schemas are
immutable (with the exception of RecordSchema allowing deferred setting of Fields) it makes
sense to cache the hash code on the object instead of recalculating every time the hashCode
method gets called. 
> (Are there other mutable Schema sub-types that I'm not thinking about?)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message