avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yibing Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1811) SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with Strings instead of UTF8
Date Tue, 05 Jul 2016 15:15:11 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362617#comment-15362617

Yibing Shi commented on AVRO-1811:

What's the current behavior if we try to turn a SpecificRecord that contains either representation
into a GenericRecord
I believe it depends on how to do the conversion.
If we first serialize the SpecificRecord (no matter using high/low level representation),
and then deserialize it into a GenericRecord, it should work fine , but the logical type fields
will use low level representations in GenericRecord, unless conversion objects are explicitly
added to GenericRecord before reading the underlying data.
If we use {{GenericRecord.deepCopy}} directly to copy a SpecificRecord with high level logical
type representations (BigDecimal etc.), I believe it would fail, because {{GenericData.deepCopy}}
doesn't understand the high level representations at all. A CCE would be thrown out. Copying
a SpecificRecord with low level representation should be fine because it is just the same
as before Logical Type is adopted in.
Actually, we may also face problems when deepCopy a GenericData to SpecificData. The setField
method in SpecificData needs a value that matches field type. If the logical type field uses
high level representation, I expect this copying would fail, because deepCopy at this moment
only returns low level representations.
I haven't tested these behaviours though. Will do some test when I have time.

The patch uploaded here should be able to solve the immediate problem. Thanks for your acceptance
of it as a short-term solution. I am just afraid we may face other similar problems in other
scenarios. Maybe we should create another JIRA to track the wider "convertible" values problem?

> SpecificData.deepCopy() cannot be used if schema compiler generated Java objects with
Strings instead of UTF8
> -------------------------------------------------------------------------------------------------------------
>                 Key: AVRO-1811
>                 URL: https://issues.apache.org/jira/browse/AVRO-1811
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.8.0, 1.8.1
>            Reporter: Ryon Day
>            Assignee: Yibing Shi
>            Priority: Critical
>         Attachments: AVRO-1811.1.patch
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> When the Avro compiler creates Java objects, you have the option to have them generate
fields of type {{string}} with the Java standard {{String}} type, for wide interoperability
with existing Java applications and APIs.
> By default, however, the compiler outputs these fields in the Avro-specific {{Utf8}}
type, requiring frequent usage of the {{toString()}} method in order for default domain objects
to be used with the majority of Java libraries.
> There are two ways to get around this. The first is to annotate every {{string}} field
in a schema like so:
> {code}
>     {
>       "name": "some_string",
>       "doc": "a field that is guaranteed to compile to java.lang.String",
>       "type": [
>         "null",
>         {
>           "type": "string",
>           "avro.java.string": "String"
>         }
>       ]
>     },
> {code}
> Unfortunately, long schemas containing many string fields can be dominated by this annotation
by volume; for teams using heterogenous clients, they may to want to avoid  Java-specific
annotation in their schema files, or may not think to use it unless there exist Java exploiters
of the schema at the time the schema is proposed and written.
> The other solution to the problem is to compile the schema into Java objects  using the
{{SpecificCompiler}}'s string type selection. This option actually alters the schema carried
by the object's {{SCHEMA$}} field to have the above annotation in it, ensuring that when used
by the Java API, the String type will be used. 
> Unfortunately, this method is not interoperable with GenericRecords created by libraries
that use the _original_ schema.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> # Create a schema with several {{string}} fields.
> # Parse the schema using the standard Avro schema parser
> # Create Java domain objects for that schema ensuring usage of the {{java.lang.String}}
string type.
> # Create a message of some sort that ends up as a {{GenericRecord}} of the original schema
> # Attempt to use {{SpecificData.deepCopy()}} to make a {{SpecificRecord}} out of the
> There is a unit test that demonstrate this [here|https://github.com/ryonday/avroDecodingHelp/blob/master/1.8.0/src/test/java/com/ryonday/avro/test/v180/AvroDeepCopyTest.java]
> {panel}
> {panel:title=Expected Results|titleBGColor=#AD3|bgColor=#DDD}
> As the schemas are literally identical aside from string type, the conversion should
work (and does work for schema that are exactly identical).
> {panel}
> {panel:title=Actual Results|titleBGColor=#D55|bgColor=#DDD}
> {{ClassCastException}} with the message {{org.apache.avro.util.Utf8 cannot be cast to
> {panel}

This message was sent by Atlassian JIRA

View raw message