avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryon Day (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AVRO-1811) SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with Strings instead of UTF8
Date Fri, 11 Mar 2016 18:58:38 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ryon Day updated AVRO-1811:
---------------------------
    Description: 
{panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
When the Avro compiler creates Java objects, you have the option to have them generate fields
of type {{string}} with the Java standard {{String}} type, for wide interoperability with
existing Java applications and APIs.

By default, however, the compiler outputs these fields in the Avro-specific {{UTF8}} type,
requiring frequent usage of the {{toString()}} method in order for default domain objects
to be used with the majority of Java libraries.

There are two ways to get around this. The first is to annotate every {{string}} field in
a schema like so:

{code}
    {
      "name": "some_string",
      "doc": "a field that is guaranteed to compile to java.lang.String",
      "type": [
        "null",
        {
          "type": "string",
          "avro.java.string": "String"
        }
      ]
    },
{code}

Unfortunately, long schemas containing many string fields can be dominated by this annotation
by volume; for teams using heterogenous clients, they may to want to avoid  Java-specific
annotation in their schema files, or may not think to use it unless there exist Java exploiters
of the schema at the time the schema is proposed and written.

The other solution to the problem is to compile the schema into Java objects  using the {{SpecificCompiler}}'s
string type selection. This option actually alters the schema carried by the object's {{SCHEMA$}}
field to have the above annotation in it, ensuring that when used by the Java API, the String
type will be used. 

Unfortunately, this method is not interoperable with GenericRecords created by libraries that
use the _original_ schema.
{panel}

{panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
# Create a schema with several {{string}} fields.
# Parse the schema using the standard Avro schema parser
# Create Java domain objects for that schema ensuring usage of the {{java.lang.String}} string
type.
# Create a message of some sort that ends up as a {{GenericRecord}} of the original schema
# Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out of the {{GenericRecord}}


There is a unit test that demonstrate this [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java]
{panel}

{panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD}
As the schemas are literally identical aside from string type, the conversion should work
(and does work for schema that are exactly identical).
{panel}

{panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD}
{{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be cast to java.lang.String}}
{panel}


  was:
{panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
When the Avro compiler creates Java objects, you have the option to have them generate fields
of type {{string}} with the Java standard {{String}} type, for wide interoperability with
existing Java applications and APIs.

By default, however, the compiler outputs these fields in the Avro-specific {{UTF8}} type,
requiring frequent usage of the {{toString()}} method in order for default domain objects
to be used with the majority of Java libraries.

There are two ways to get around this. The first is to annotate every {{string}} field in
a schema like so:

{code}
    {
      "name": "some_string",
      "doc": "a field that is guaranteed to compile to java.lang.String",
      "type": [
        "null",
        {
          "type": "string",
          "avro.java.string": "String"
        }
      ]
    },
{code}

Unfortunately, long schemas containing many string fields can be dominated by this annotation
by volume; for teams using heterogenous clients, they may to want to put Java-specific annotation
in their schema files, or may not think to use it unless there exist Java exploiters of the
schema at the time the schema is proposed and written.

The other solution to the problem is to compile the schema into Java objects  using the {{SpecificCompiler}}'s
string type selection. This option actually alters the schema carried by the object's {{SCHEMA$}}
field to have the above annotation in it, ensuring that when used by the Java API, the String
type will be used. 

Unfortunately, this method is not interoperable with GenericRecords created by libraries that
use the _original_ schema.
{panel}

{panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
# Create a schema with several {{string}} fields.
# Parse the schema using the standard Avro schema parser
# Create Java domain objects for that schema ensuring usage of the {{java.lang.String}} string
type.
# Create a message of some sort that ends up as a {{GenericRecord}} of the original schema
# Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out of the {{GenericRecord}}


There is a unit test that demonstrate this [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java]
{panel}

{panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD}
As the schemas are literally identical aside from string type, the conversion should work
(and does work for schema that are exactly identical).
{panel}

{panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD}
{{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be cast to java.lang.String}}
{panel}



> SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with
Strings instead of UTF8
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1811
>                 URL: https://issues.apache.org/jira/browse/AVRO-1811
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Ryon Day
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> When the Avro compiler creates Java objects, you have the option to have them generate
fields of type {{string}} with the Java standard {{String}} type, for wide interoperability
with existing Java applications and APIs.
> By default, however, the compiler outputs these fields in the Avro-specific {{UTF8}}
type, requiring frequent usage of the {{toString()}} method in order for default domain objects
to be used with the majority of Java libraries.
> There are two ways to get around this. The first is to annotate every {{string}} field
in a schema like so:
> {code}
>     {
>       "name": "some_string",
>       "doc": "a field that is guaranteed to compile to java.lang.String",
>       "type": [
>         "null",
>         {
>           "type": "string",
>           "avro.java.string": "String"
>         }
>       ]
>     },
> {code}
> Unfortunately, long schemas containing many string fields can be dominated by this annotation
by volume; for teams using heterogenous clients, they may to want to avoid  Java-specific
annotation in their schema files, or may not think to use it unless there exist Java exploiters
of the schema at the time the schema is proposed and written.
> The other solution to the problem is to compile the schema into Java objects  using the
{{SpecificCompiler}}'s string type selection. This option actually alters the schema carried
by the object's {{SCHEMA$}} field to have the above annotation in it, ensuring that when used
by the Java API, the String type will be used. 
> Unfortunately, this method is not interoperable with GenericRecords created by libraries
that use the _original_ schema.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> # Create a schema with several {{string}} fields.
> # Parse the schema using the standard Avro schema parser
> # Create Java domain objects for that schema ensuring usage of the {{java.lang.String}}
string type.
> # Create a message of some sort that ends up as a {{GenericRecord}} of the original schema
> # Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out of the
{{GenericRecord}} 
> There is a unit test that demonstrate this [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java]
> {panel}
> {panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD}
> As the schemas are literally identical aside from string type, the conversion should
work (and does work for schema that are exactly identical).
> {panel}
> {panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD}
> {{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be cast to
java.lang.String}}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message