avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karel Fuka (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1827) Handling correctly optional fields when converting Protobuf to Avro
Date Wed, 02 Nov 2016 14:59:58 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629209#comment-15629209

Karel Fuka commented on AVRO-1827:


I had been working on the original patch with Jakub and because this is becoming critical
for us I am moving forward with it. [~tinxu] - thanks for your valuable feedback and you are
right, there was an issue with the original patch. Though, it is a bit more subtle not completely
related to a nested union. Let me explain:
1. If I run the following, it is all fine:
    public static enum E { A, B };
    public void test1() {
        Schema s3 = Schema.createUnion(Arrays.asList(

        int ind = ReflectData.get().resolveUnion(s3, E.A);
2. When I generate a ProtoBuf with the following definition and try to convert it to Avro,
I really experience an exception thrown by resolveUnion():
message EMessage {
  enum ET {
    A = 1;
    B = 2;
  optional ET et = 3;
Note that the schema for 'et' is equivalent to what you see above. 

The real problem lies in 'ProtobufData' class, which extends 'GenericData' but does not override
some methods (like 'getEnumSchema'), so a GenericData implementation gets called instead (and
fails). Note that for example 'getEnumSchema' comment contains a text reading _"May be overridden
for alternate enum"_. 

Therefore, I have addressed this in the latest version, which I am going to attach. Please
review and let me know if you see any more issues.


> Handling correctly optional fields when converting Protobuf to Avro
> -------------------------------------------------------------------
>                 Key: AVRO-1827
>                 URL: https://issues.apache.org/jira/browse/AVRO-1827
>             Project: Avro
>          Issue Type: Improvement
>    Affects Versions: 1.7.7, 1.8.0
>            Reporter: Jakub Kahovec
>         Attachments: AVRO-1827.patch, AVRO-1827.patch, AVRO-1827.patch
> Hello,
> as of the current implementation of converting protobuf files into avro format, protobuf
optional fields are being  given default values in the avro schema if not specified explicitly.

> So for instance when the protobuf field is defined as  
> {quote}
> optional int64 fieldInt64 = 1;
> {quote}
> in the avro schema it appears as
> {quote}
>  "name" : "fieldInt64",
>   "type" : "long",
>   "default" : 0
> {quote}
> The problem with this implementation is that we are losing information about whether
the field was present or not in the original protobuf, as when we ask for this field's value
in avro we will be given the default value. 
> What I'm proposing instead is that if the field in the protobuf is defined as optional
and has no default value then the generated avro schema type will us a union comprising the
matching type and null type with default value null. It is going to look like this:
> {quote}
>  "name" : "fieldIn64",
>   "type" : [ "null", "long" ],
>   "default" : null
> {quote}
> I'm aware that is a breaking change but I think that is the proper way how to handle
optional fields.
> I've also  created a patch which fixes the conversion
> Jakub 

This message was sent by Atlassian JIRA

View raw message