avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Kleppmann <mar...@rapportive.com>
Subject Re: Picking up default value for a union?
Date Wed, 10 Apr 2013 03:42:32 GMT
With Avro, it is generally assumed that your reader is working with
the exact same schema as the data was written with. If you want to
change your schema, e.g. add a field to a record, you still need the
exact same schema as was used for writing (the "writer's schema"), but
you can also give the decoder a second schema (the "reader's schema"),
and Avro will map data from the writer's schema into the reader's
schema for you ("schema evolution").

This requirement of having the exact same schema as the writer makes
more sense with Avro's binary encoding, because it allows Avro to omit
the field names, which makes the encoding very compact. The
requirement makes less sense if you're using the JSON encoding, where
field names are inevitably part of the JSON. I think this behaviour is
expected, but I agree that it's a bit surprising, so perhaps it's
worth discussing whether we should change it.

To answer your question, your input data {} looks like it was written
with a writer schema of {"name":"hey", "type":"record", "fields":[]}
so try using that as your writer schema. Then if you specify
{"name":"hey", "type":"record",
"fields":[{"name":"a","type":["null","string"],"default":"null"}]} as
your reader schema, you should find that the resolving decoder fills
in the field "a" with the default null.


On 9 April 2013 02:44, Jonathan Coveney <jcoveney@gmail.com> wrote:
> Stepping through the code, it looks like the code only uses defaults for
> writing, not for reading. IE at read time it assumes that the defaults were
> already filled in. It seems like if the reader evolved the schema to include
> new fields, it would be desirable for the defaults to get filled in if not
> present? But stepping through, on reading the defaults are completely
> ignored.
> 2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
>> Please note: {"name":"hey", "type":"record",
>> "fields":[{"name":"a","type":["null","string"],"default":"null"}]} also
>> doesn't work
>> 2013/4/9 Jonathan Coveney <jcoveney@gmail.com>
>>> I have the following schema: {"name":"hey", "type":"record",
>>> "fields":[{"name":"a","type":["null","string"],"default":null}]}
>>> I am trying to deserialize the following against this schema using Java
>>> and the GenericDatumReader: {}
>>> I get the following error:
>>> Caused by: org.apache.avro.AvroTypeException: Expected start-union. Got
>>>     at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
>>>     at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
>>>     at
>>> org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>     at
>>> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>>     at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>>     at com.spotify.hadoop.JsonTester.main(JsonTester.java:40)
>>> I'm not seeing any immediate issues online around this...is this
>>> expected? I'm reading it in as such:
>>> Schema avroSchema = new Schema.Parser().parse(schemaLine);
>>> GenericDatumReader<Object> reader = new
>>> GenericDatumReader<Object>(avroSchema);
>>> Object datum = reader.read(null,
>>> DecoderFactory.get().jsonDecoder(avroSchema, dataLine));
>>> I'm going to see what's up and why it isn't picking up the default, but
>>> imagined you guys might know what's up?
>>> Thanks,
>>> Jon

View raw message