avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Holmes <grep.a...@gmail.com>
Subject Re: Avro versioning and SpecificDatum's
Date Tue, 20 Sep 2011 10:44:05 GMT
Created the following ticket:

https://issues.apache.org/jira/browse/AVRO-891

Thanks,
Alex

On Tue, Sep 20, 2011 at 6:26 AM, Alex Holmes <grep.alex@gmail.com> wrote:
> Thanks, I'll add a bug.
>
> As a FYI, even without the alias (retaining the original field name),
> just removing the "id" field yields the exception.
>
> On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <scottcarey@apache.org> wrote:
>> That looks like a bug.  What happens if there is no aliasing/renaming
>> involved?  Aliasing is a newer feature than field addition, removal, and
>> promotion.
>>
>> This should be easy to reproduce, can you file a JIRA ticket?  We should
>> discuss this further there.
>>
>> Thanks!
>>
>>
>> On 9/19/11 6:14 PM, "Alex Holmes" <grep.alex@gmail.com> wrote:
>>
>>>OK, I was able to reproduce the exception.
>>>
>>>v1:
>>>{"name": "Record", "type": "record",
>>>  "fields": [
>>>    {"name": "name", "type": "string"},
>>>    {"name": "id", "type": "int"}
>>>  ]
>>>}
>>>
>>>v2:
>>>{"name": "Record", "type": "record",
>>>  "fields": [
>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>>>  ]
>>>}
>>>
>>>Step 1.  Write Avro file using v1 generated class
>>>Step 2.  Read Avro file using v2 generated class
>>>
>>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
>>>       at Record.put(Unknown Source)
>>>       at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
>>>       at
>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j
>>>ava:166)
>>>       at
>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
>>>8)
>>>       at
>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
>>>9)
>>>       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>>>       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>>>       at Read.readFromAvro(Unknown Source)
>>>       at Read.main(Unknown Source)
>>>
>>>The code to write/read the avro file didn't change from below.
>>>
>>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <grep.alex@gmail.com> wrote:
>>>> I'm trying to put together a simple test case to reproduce the
>>>> exception.  While I was creating the test case, I hit this behavior
>>>> which doesn't seem right, but maybe it's my misunderstanding on how
>>>> forward/backward compatibility should work:
>>>>
>>>> Schema v1:
>>>>
>>>> {"name": "Record", "type": "record",
>>>>  "fields": [
>>>>    {"name": "name", "type": "string"},
>>>>    {"name": "id", "type": "int"}
>>>>  ]
>>>> }
>>>>
>>>> Schema v2:
>>>>
>>>> {"name": "Record", "type": "record",
>>>>  "fields": [
>>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>>>    {"name": "new_field", "type": "int", "default":"0"}
>>>>  ]
>>>> }
>>>>
>>>> In the 2nd version I:
>>>>
>>>> - removed field "id"
>>>> - renamed field "name" to "name_rename"
>>>> - added field "new_field"
>>>>
>>>> I write the v1 data file:
>>>>
>>>>  public static Record createRecord(String name, int id) {
>>>>    Record record = new Record();
>>>>    record.name = name;
>>>>    record.id = id;
>>>>    return record;
>>>>  }
>>>>
>>>>  public static void writeToAvro(OutputStream outputStream)
>>>>      throws IOException {
>>>>    DataFileWriter<Record> writer =
>>>>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>>>    writer.create(Record.SCHEMA$, outputStream);
>>>>
>>>>    writer.append(createRecord("r1", 1));
>>>>    writer.append(createRecord("r2", 2));
>>>>
>>>>    writer.close();
>>>>    outputStream.close();
>>>>  }
>>>>
>>>> I wrote a version-agnostic Read class:
>>>>
>>>>  public static void readFromAvro(InputStream is) throws IOException {
>>>>    DataFileStream<Record> reader = new DataFileStream<Record>(
>>>>            is, new SpecificDatumReader<Record>());
>>>>    for (Record a : reader) {
>>>>      System.out.println(ToStringBuilder.reflectionToString(a));
>>>>    }
>>>>    IOUtils.cleanup(null, is);
>>>>    IOUtils.cleanup(null, reader);
>>>>  }
>>>>
>>>> Running the Read code against the v1 data file, and including the v1
>>>> code-generated classes in the classpath produced:
>>>>
>>>> Record@6a8c436b[name=r1,id=1]
>>>> Record@6baa9f99[name=r2,id=2]
>>>>
>>>> If I run the same code, but use just the v2 generated classes in the
>>>> classpath I get:
>>>>
>>>> Record@39dd3812[name_rename=r1,new_field=1]
>>>> Record@27b15692[name_rename=r2,new_field=2]
>>>>
>>>> The name_rename field seems to be good, but why would "new_field"
>>>> inherit the values of the deleted field "id"?
>>>>
>>>> Cheers,
>>>> Alex
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <cutting@apache.org>
>>>>wrote:
>>>>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>>>>>> I then modified my original schema by adding, deleting and renaming
>>>>>> some fields, creating version 2 of the schema.  After re-creating
the
>>>>>> Java classes I attempted to read the version 1 file using the
>>>>>> DataFileStream (with a SpecificDatumReader), and this is throwing
an
>>>>>> exception.
>>>>>
>>>>> This should work.  Can you provide more detail?  What is the exception?
>>>>>  A reproducible test case would be great to have.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Doug
>>>>>
>>>>
>>
>>
>>
>

Mime
View raw message