avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sachin Goyal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1562) Add support for types extending Maps/Collections
Date Fri, 29 Aug 2014 22:27:55 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115961#comment-14115961
] 

Sachin Goyal commented on AVRO-1562:
------------------------------------

My 2c:
Our decision to include this feature should be based on complexity and performance-penalty.
If there is a clean solution to support even a complex feature, it should be considered.

Note that the current patch does not claim to be clean or performant and there may be scope
to improve it further.
But it would be good to understand if its really too complex to support.
If its too complex or making it less performant, then we should not fix.

IMHO, lots of Java designs would become usable with Avro, Hadoop and Hive with this fix.
That would be a good incentive to analyze the complexity.

> Add support for types extending Maps/Collections
> ------------------------------------------------
>
>                 Key: AVRO-1562
>                 URL: https://issues.apache.org/jira/browse/AVRO-1562
>             Project: Avro
>          Issue Type: Bug
>    Affects Versions: 1.7.6
>            Reporter: Sachin Goyal
>         Attachments: custom_map_and_collections1.patch
>
>
> Consider the following code:
> {code}
> import java.io.ByteArrayOutputStream;
> import java.util.*;
> import org.apache.avro.Schema;
> import org.apache.avro.file.DataFileWriter;
> import org.apache.avro.reflect.ReflectData;
> import org.apache.avro.reflect.ReflectDatumWriter;
> public class AvroDerivingMaps
> {
>     public static void main (String [] args) throws Exception
>     {
>         MapDerivedContainer orig = new MapDerivedContainer();
>         ReflectData rdata = ReflectData.AllowNull.get();
>         Schema schema = rdata.getSchema(MapDerivedContainer.class);
>         System.out.println(schema);
>         
>         ReflectDatumWriter<MapDerivedContainer> datumWriter = new ReflectDatumWriter
(MapDerivedContainer.class, rdata);
>         DataFileWriter<MapDerivedContainer> fileWriter = new DataFileWriter<MapDerivedContainer>
(datumWriter);
>         ByteArrayOutputStream baos = new ByteArrayOutputStream();
>         fileWriter.create(schema, baos);
>         fileWriter.append(orig);
>         fileWriter.close();
>     }
> }
> class MapDerived extends HashMap<String, Integer>
> {
>     Integer a = 1;
>     String b = "b";
> }
> class MapDerivedContainer
> {
>     MapDerived2 map = new MapDerived2();
> }
> class MapDerived2 extends MapDerived
> {
>     String c = "c";
> }
> {code}
> \\
> \\
> It throws the following exception:
> {code:javascript}
> {"type":"record","name":"MapDerivedContainer","namespace":"avro","fields":[{"name":"map","type":["null",{"type":"record","name":"MapDerived2","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}],"default":null}]}
> {code}
> {color:brown}
> Exception in thread "main" org.apache.avro.file.DataFileWriter$AppendWriteException:
> org.apache.avro.UnresolvedUnionException: 
> Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"record","name":"MapDerived2","namespace":"avro","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}]:
{}
> 	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:600)
> 	at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
> 	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
> 	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
> 	at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:203)
> 	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
> 	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
> 	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
> 	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
> 	... 1 more
> {color}
> \\
> \\
> It appears that ReflectData#createSchema() checks for "type instanceof ParameterizedType"
and because of this, it skips handling of the map.
> The same is not true of GenericData#isMap() and GenericData#resolveUnion() fails because
of this.
> The same may be true for classes extending ArrayList, Collection, Set etc.
> Also, note the schema for the class extending Map:
> {code:javascript}
> {  
>    "type":"record",
>    "name":"MapDerived2",
>    "fields":[  
>       {  
>          "name":"c",
>          "type":[  
>             "null",
>             "string"
>          ],
>          "default":null
>       },
>       {  
>          "name":"a",
>          "type":[  
>             "null",
>             "int"
>          ],
>          "default":null
>       },
>       {  
>          "name":"b",
>          "type":[  
>             "null",
>             "string"
>          ],
>          "default":null
>       }
>    ]
> }
> {code}
> This schema ignores the Map completely.
> Probably, for such a class, the schema should look like:
> {code:javascript}
> {
>    "type":"record",
>    "name":"MapDerived2",
>    "fields":[  
>       {  
>          "name":"c",
>          "type":[  
>             "null",
>             "string"
>          ],
>          "default":null
>       },
>       .... // Other fields in the class extending the Map
>      {
>         "name":"BASE_MAP",
>          "type":[
>             "null",
>             "map" ... // Normal map which the class extends (implements?)
>          ],
>          "default":null
>      }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message