avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Holmes <grep.a...@gmail.com>
Subject Re: Avro versioning and SpecificDatum's
Date Tue, 20 Sep 2011 10:26:49 GMT
Thanks, I'll add a bug.

As a FYI, even without the alias (retaining the original field name),
just removing the "id" field yields the exception.

On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <scottcarey@apache.org> wrote:
> That looks like a bug.  What happens if there is no aliasing/renaming
> involved?  Aliasing is a newer feature than field addition, removal, and
> promotion.
>
> This should be easy to reproduce, can you file a JIRA ticket?  We should
> discuss this further there.
>
> Thanks!
>
>
> On 9/19/11 6:14 PM, "Alex Holmes" <grep.alex@gmail.com> wrote:
>
>>OK, I was able to reproduce the exception.
>>
>>v1:
>>{"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name", "type": "string"},
>>    {"name": "id", "type": "int"}
>>  ]
>>}
>>
>>v2:
>>{"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>>  ]
>>}
>>
>>Step 1.  Write Avro file using v1 generated class
>>Step 2.  Read Avro file using v2 generated class
>>
>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
>>       at Record.put(Unknown Source)
>>       at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
>>       at
>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j
>>ava:166)
>>       at
>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
>>8)
>>       at
>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
>>9)
>>       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>>       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>>       at Read.readFromAvro(Unknown Source)
>>       at Read.main(Unknown Source)
>>
>>The code to write/read the avro file didn't change from below.
>>
>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <grep.alex@gmail.com> wrote:
>>> I'm trying to put together a simple test case to reproduce the
>>> exception.  While I was creating the test case, I hit this behavior
>>> which doesn't seem right, but maybe it's my misunderstanding on how
>>> forward/backward compatibility should work:
>>>
>>> Schema v1:
>>>
>>> {"name": "Record", "type": "record",
>>>  "fields": [
>>>    {"name": "name", "type": "string"},
>>>    {"name": "id", "type": "int"}
>>>  ]
>>> }
>>>
>>> Schema v2:
>>>
>>> {"name": "Record", "type": "record",
>>>  "fields": [
>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>>    {"name": "new_field", "type": "int", "default":"0"}
>>>  ]
>>> }
>>>
>>> In the 2nd version I:
>>>
>>> - removed field "id"
>>> - renamed field "name" to "name_rename"
>>> - added field "new_field"
>>>
>>> I write the v1 data file:
>>>
>>>  public static Record createRecord(String name, int id) {
>>>    Record record = new Record();
>>>    record.name = name;
>>>    record.id = id;
>>>    return record;
>>>  }
>>>
>>>  public static void writeToAvro(OutputStream outputStream)
>>>      throws IOException {
>>>    DataFileWriter<Record> writer =
>>>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>>    writer.create(Record.SCHEMA$, outputStream);
>>>
>>>    writer.append(createRecord("r1", 1));
>>>    writer.append(createRecord("r2", 2));
>>>
>>>    writer.close();
>>>    outputStream.close();
>>>  }
>>>
>>> I wrote a version-agnostic Read class:
>>>
>>>  public static void readFromAvro(InputStream is) throws IOException {
>>>    DataFileStream<Record> reader = new DataFileStream<Record>(
>>>            is, new SpecificDatumReader<Record>());
>>>    for (Record a : reader) {
>>>      System.out.println(ToStringBuilder.reflectionToString(a));
>>>    }
>>>    IOUtils.cleanup(null, is);
>>>    IOUtils.cleanup(null, reader);
>>>  }
>>>
>>> Running the Read code against the v1 data file, and including the v1
>>> code-generated classes in the classpath produced:
>>>
>>> Record@6a8c436b[name=r1,id=1]
>>> Record@6baa9f99[name=r2,id=2]
>>>
>>> If I run the same code, but use just the v2 generated classes in the
>>> classpath I get:
>>>
>>> Record@39dd3812[name_rename=r1,new_field=1]
>>> Record@27b15692[name_rename=r2,new_field=2]
>>>
>>> The name_rename field seems to be good, but why would "new_field"
>>> inherit the values of the deleted field "id"?
>>>
>>> Cheers,
>>> Alex
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <cutting@apache.org>
>>>wrote:
>>>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>>>>> I then modified my original schema by adding, deleting and renaming
>>>>> some fields, creating version 2 of the schema.  After re-creating the
>>>>> Java classes I attempted to read the version 1 file using the
>>>>> DataFileStream (with a SpecificDatumReader), and this is throwing an
>>>>> exception.
>>>>
>>>> This should work.  Can you provide more detail?  What is the exception?
>>>>  A reproducible test case would be great to have.
>>>>
>>>> Thanks,
>>>>
>>>> Doug
>>>>
>>>
>
>
>

Mime
View raw message