avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han, Xiaodan" <xiaodan....@baml.com>
Subject RE: 1.7.6 Slow Deserialization
Date Mon, 16 Jun 2014 14:59:54 GMT
We modified our code to create only one reader instance for same schema. That does seem to
solve the problem. However a follow up concern is thread safety. Is GenericDatumReader thread
safe? What about the writer. We also create new Writer for each serialization, and haven't
seen performance issue yet. If we change the behavior of the Reader, would you suggest we
also make the same change for the Writer?


-----Original Message-----
From: Doug Cutting [mailto:cutting@apache.org] 
Sent: Friday, June 13, 2014 1:59 PM
To: user@avro.apache.org
Subject: Re: 1.7.6 Slow Deserialization

On Wed, Jun 11, 2014 at 1:05 PM, Han, Xiaodan <xiaodan.han@baml.com> wrote:
> org.apache.avro.specific.SpecificDatumReader.findStringClass(SpecificD
> atumReader.java:80)
> org.apache.avro.generic.GenericDatumReader.getStringClass(GenericDatum
> Reader.java:394)

The result of findStringClass are cached by getStringClass, so there should only be one call
per schema used with a GenericDatumReader instance.  So, with 22k calls to this method in
the samples, perhaps you're either creating a new schema or a new GenericDatumReader per instance
read.  Could that be possible?


This message, and any attachments, is for the intended recipient(s) only, may contain information
that is privileged, confidential and/or proprietary and subject to important terms and conditions
available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient,
please delete this message.

View raw message