avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sachin Goyal (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AVRO-1562) Add support for types extending Maps/Collections
Date Thu, 07 Aug 2014 04:12:12 GMT
Sachin Goyal created AVRO-1562:
----------------------------------

             Summary: Add support for types extending Maps/Collections
                 Key: AVRO-1562
                 URL: https://issues.apache.org/jira/browse/AVRO-1562
             Project: Avro
          Issue Type: Bug
    Affects Versions: 1.7.6
            Reporter: Sachin Goyal


Consider the following code:
{code}
import java.io.ByteArrayOutputStream;
import java.util.*;

import org.apache.avro.Schema;
import org.apache.avro.file.DataFileWriter;
import org.apache.avro.reflect.ReflectData;
import org.apache.avro.reflect.ReflectDatumWriter;

public class AvroDerivingMaps
{
    public static void main (String [] args) throws Exception
    {
        MapDerivedContainer orig = new MapDerivedContainer();
        ReflectData rdata = ReflectData.AllowNull.get();
        Schema schema = rdata.getSchema(MapDerivedContainer.class);
        System.out.println(schema);
        
        ReflectDatumWriter<MapDerivedContainer> datumWriter = new ReflectDatumWriter
(MapDerivedContainer.class, rdata);
        DataFileWriter<MapDerivedContainer> fileWriter = new DataFileWriter<MapDerivedContainer>
(datumWriter);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        fileWriter.create(schema, baos);
        fileWriter.append(orig);
        fileWriter.close();
    }
}

class MapDerived extends HashMap<String, Integer>
{
    Integer a = 1;
    String b = "b";
}

class MapDerivedContainer
{
    MapDerived2 map = new MapDerived2();
}

class MapDerived2 extends MapDerived
{
    String c = "c";
}
{code}
\\
\\
It throws the following exception:
{code:javascript}
{"type":"record","name":"MapDerivedContainer","namespace":"avro","fields":[{"name":"map","type":["null",{"type":"record","name":"MapDerived2","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}],"default":null}]}
{code}
{color:brown}
Exception in thread "main" org.apache.avro.file.DataFileWriter$AppendWriteException:
org.apache.avro.UnresolvedUnionException: 
Caused by: org.apache.avro.UnresolvedUnionException: Not in union ["null",{"type":"record","name":"MapDerived2","namespace":"avro","fields":[{"name":"c","type":["null","string"],"default":null},{"name":"a","type":["null","int"],"default":null},{"name":"b","type":["null","string"],"default":null}]}]:
{}
	at org.apache.avro.generic.GenericData.resolveUnion(GenericData.java:600)
	at org.apache.avro.generic.GenericDatumWriter.resolveUnion(GenericDatumWriter.java:151)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:71)
	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
	at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:203)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:145)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
	... 1 more
{color}

\\
\\
It appears that ReflectData#createSchema() checks for "type instanceof ParameterizedType"
and because of this, it skips handling of the map.
The same is not true of GenericData#isMap() and GenericData#resolveUnion() fails because of
this.

The same may be true for classes extending ArrayList, Collection, Set etc.
Also, note the schema for the class extending Map:
{code:javascript}
{  
   "type":"record",
   "name":"MapDerived2",
   "fields":[  
      {  
         "name":"c",
         "type":[  
            "null",
            "string"
         ],
         "default":null
      },
      {  
         "name":"a",
         "type":[  
            "null",
            "int"
         ],
         "default":null
      },
      {  
         "name":"b",
         "type":[  
            "null",
            "string"
         ],
         "default":null
      }
   ]
}
{code}
This schema ignores the Map completely.
Probably, for such a class, the schema should look like:
{code:javascript}
{
   "type":"record",
   "name":"MapDerived2",
   "fields":[  
      {  
         "name":"c",
         "type":[  
            "null",
            "string"
         ],
         "default":null
      },
      .... // Other fields in the class extending the Map
     {
        "name":"BASE_MAP",
         "type":[
            "null",
            "map" ... // Normal map which the class extends (implements?)
         ],
         "default":null
     }
}
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message