avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: avro object reuse
Date Thu, 02 Jun 2011 01:48:15 GMT
One thing we do right now that might be related is the following:

We keep Avro default Schema values as JsonNode objects. While traversing
the JSON Avro schema representation using ObjectMapper.readTree() we
remember JsonNodes that are "default" properties on fields and keep them
on the Schema object.
If these keep references to the parent (and the whole JSON tree, or worse,
the ObjectMapper and input stream) it would be poor use of Jackson by us;
although we'd need a way to keep a detached JsonNode or equivalent.

However, even if that is the case (which it does not seem to be -- the
jmap output has no JsonNode instances), it doesn't explain why we would be
calling ObjectMapper frequently.  We only call
ObjectMapper.readTree(JsonParser) when creating a Schema from JSON.  We
call JsonNode methods from extracted fragments for everything else.


This brings me to the following suspicion based on the data:
Somewhere, Schema objects are being created frequently via one of the
Schema.parse() or Protocol.parse() static methods.

On 6/1/11 5:48 PM, "Tatu Saloranta" <tsaloranta@gmail.com> wrote:

>On Wed, Jun 1, 2011 at 5:45 PM, Scott Carey <scott@richrelevance.com>
>wrote:
>> It would be useful to get a 'jmap -histo:live' report as well, which
>>will
>> only have items that remain after a full GC.
>>
>> However, a high churn of short lived Jackson objects is not expected
>>here
>> unless the user is reading Json serialized files and not Avro binary.
>> Avro Data Files only contain binary encoded Avro content.
>>
>> It would be surprising to see many Jackson objects here if reading Avro
>> Data Files, because we expect to use Jackson to parse an Avro schema
>>from
>> json only once or twice per file.  After the schema is parsed, Jackson
>> shouldn't be used.   A hundred thousand DeserializationConfig instances
>> means that isn't the case.
>
>Right -- it indicates that something (else) is using Jackson; and
>there will typically be one instance of DeserializationConfig for each
>data-binding call (ObjectMapper.readValue()), as a read-only copy is
>made for operation.
>... or if something is reading schema that many times, that sounds
>like a problem in itself.
>
>-+ Tatu +-


Mime
View raw message