avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandre Normand (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AVRO-1145) Can't read union of null and primitive from value written with schema as primitive
Date Fri, 31 Aug 2012 17:27:07 GMT
Alexandre Normand created AVRO-1145:

             Summary: Can't read union of null and primitive from value written with schema
as primitive
                 Key: AVRO-1145
                 URL: https://issues.apache.org/jira/browse/AVRO-1145
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.7.0
            Reporter: Alexandre Normand

Using the its Java's generic representation API and I have a problem dealing with our current
case of schema evolution. The scenario we're dealing with here is making a primitive-type
field optional by changing the field to be a {{union}} of {{null}} and that primitive type.

I'm going to use a simple example. Basically, our schemas are:

Initial: A record with one field of type {{int}}
Second version: Same record, same field name but the type is now a union of {{null}} and {{int}}
According to the [schema resolution|http://avro.apache.org/docs/current/spec.html#Schema+Resolution]
chapter of Avro's spec, the resolution for such a case should be:

if reader's is a union, but writer's is not

The first schema in the reader's union that matches
the writer's schema is recursively resolved against 
it. If none match, an error is signalled.

My interpretation is that we should resolve data serialized with the initial schema properly
as int is part of the union in the reader's schema.

However, when running a test of reading back a record serialized with version 1 using the
version 2, I get

*{{org.apache.avro.AvroTypeException: Attempt to process a int when a union was expected.}}*

Here's a test that shows exactly this:
public void testReadingUnionFromValueWrittenAsPrimitive() throws Exception {
    Schema writerSchema = new Schema.Parser().parse("{\n" +
            "    \"type\":\"record\",\n" +
            "    \"name\":\"NeighborComparisons\",\n" +
            "    \"fields\": [\n" +
            "      {\"name\": \"test\",\n" +
            "      \"type\": \"int\" }]} ");

    Schema readersSchema = new Schema.Parser().parse(" {\n" +
            "    \"type\":\"record\",\n" +
            "    \"name\":\"NeighborComparisons\",\n" +
            "    \"fields\": [ {\n" +
            "      \"name\": \"test\",\n" +
            "      \"type\": [\"null\", \"int\"],\n" +
            "      \"default\": null } ]  }");

    // Writing a record using the initial schema with the 
    // test field defined as an int
    GenericData.Record record = new GenericData.Record(writerSchema);
    record.put("test", Integer.valueOf(10));        
    ByteArrayOutputStream output = new ByteArrayOutputStream();
    JsonEncoder jsonEncoder = EncoderFactory.get().
       jsonEncoder(writerSchema, output);
    GenericDatumWriter<GenericData.Record> writer = new 
    writer.write(record, jsonEncoder);


    // We try reading it back using the second schema 
    // version where the test field is defined as a union of null and int
    JsonDecoder jsonDecoder = DecoderFactory.get().
        jsonDecoder(readersSchema, output.toString());
    GenericDatumReader<GenericData.Record> reader =
            new GenericDatumReader<GenericData.Record>(writerSchema, 
    GenericData.Record read = reader.read(null, jsonDecoder);

    // We should be able to assert that the value is 10 but it
    // fails on reading the record before getting here
    assertEquals(10, read.get("test"));

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message