avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Halperin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-607) SpecificData.getSchema not thread-safe
Date Tue, 31 May 2016 20:22:12 GMT

    [ https://issues.apache.org/jira/browse/AVRO-607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15308554#comment-15308554
] 

Daniel Halperin commented on AVRO-607:
--------------------------------------

Thanks Ryan.

So if I'm reading this right, the only way to avoid this bug is to serialize access to SpecificData.getSchema.
Right now, we're hitting this via the following code path:

{code}
--- Thread: Thread[pool-1-thread-15,5,main] State: RUNNABLE stack: ---
  java.util.WeakHashMap.get(WeakHashMap.java:403)
  org.apache.avro.specific.SpecificData.getSchema(SpecificData.java:187)
  org.apache.avro.reflect.ReflectData.isRecord(ReflectData.java:168)
  org.apache.avro.generic.GenericData.getSchemaName(GenericData.java:612)
  org.apache.avro.specific.SpecificData.getSchemaName(SpecificData.java:265)
  org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:601)
  org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
  org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
  org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
  org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
  org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:175)
  org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
  org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
  org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:143)
  org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
  org.apache.beam.sdk.coders.AvroCoder.encode(AvroCoder.java:264)
{code}

We have many different threads all calling AvroCoder#encode (on different DatumWriter instances),
which eventually reflectively uses the static cache.

Are there any standard ways to populate the cache to avoid this problem?

> SpecificData.getSchema not thread-safe
> --------------------------------------
>
>                 Key: AVRO-607
>                 URL: https://issues.apache.org/jira/browse/AVRO-607
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.3.3, 1.8.1
>            Reporter: Stephen Tu
>             Fix For: 1.8.2
>
>         Attachments: AVRO-607.patch
>
>
> SpecificData.getSchema uses a WeakHashMap to cache schemas, but WeakHashMap is not thread-safe,
and the method itself is not synchronized. Seems like this could lead to the data structure
getting corrupted. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message