avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Blue (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1811) SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with Strings instead of UTF8
Date Sat, 16 Apr 2016 22:52:25 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244451#comment-15244451

Ryan Blue commented on AVRO-1811:

[~ryonday], thanks for the thorough bug report! It looks like the problem is that deepCopy
simply doesn't check whether the returned data should be a Utf8 or a String. It would be relatively
easy to fix this by adding a method for string construction to GenericData and overriding
it in SpecificData. Are you interested in contributing a patch?

> SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with
Strings instead of UTF8
> -------------------------------------------------------------------------------------------------------------
>                 Key: AVRO-1811
>                 URL: https://issues.apache.org/jira/browse/AVRO-1811
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.0
>            Reporter: Ryon Day
>            Priority: Critical
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> When the Avro compiler creates Java objects, you have the option to have them generate
fields of type {{string}} with the Java standard {{String}} type, for wide interoperability
with existing Java applications and APIs.
> By default, however, the compiler outputs these fields in the Avro-specific {{Utf8}}
type, requiring frequent usage of the {{toString()}} method in order for default domain objects
to be used with the majority of Java libraries.
> There are two ways to get around this. The first is to annotate every {{string}} field
in a schema like so:
> {code}
>     {
>       "name": "some_string",
>       "doc": "a field that is guaranteed to compile to java.lang.String",
>       "type": [
>         "null",
>         {
>           "type": "string",
>           "avro.java.string": "String"
>         }
>       ]
>     },
> {code}
> Unfortunately, long schemas containing many string fields can be dominated by this annotation
by volume; for teams using heterogenous clients, they may to want to avoid  Java-specific
annotation in their schema files, or may not think to use it unless there exist Java exploiters
of the schema at the time the schema is proposed and written.
> The other solution to the problem is to compile the schema into Java objects  using the
{{SpecificCompiler}}'s string type selection. This option actually alters the schema carried
by the object's {{SCHEMA$}} field to have the above annotation in it, ensuring that when used
by the Java API, the String type will be used. 
> Unfortunately, this method is not interoperable with GenericRecords created by libraries
that use the _original_ schema.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> # Create a schema with several {{string}} fields.
> # Parse the schema using the standard Avro schema parser
> # Create Java domain objects for that schema ensuring usage of the {{java.lang.String}}
string type.
> # Create a message of some sort that ends up as a {{GenericRecord}} of the original schema
> # Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out of the
> There is a unit test that demonstrate this [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java]
> {panel}
> {panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD}
> As the schemas are literally identical aside from string type, the conversion should
work (and does work for schema that are exactly identical).
> {panel}
> {panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD}
> {{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be cast to
> {panel}

This message was sent by Atlassian JIRA

View raw message