avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <b...@cloudera.com>
Subject Re: Avro schema doesn't honor backward compatibilty
Date Tue, 02 Feb 2016 17:19:51 GMT
Hi Raghvendra,

It looks like the problem is that you're using the new schema in place 
of the schema that the data was written with.  You should run setSchema 
on your SpecificDatumReader to set the schema the data was written with.

What's happening is that the schema you're using, the new one, has the 
new field so Avro assumes it is present and tries to read it. By setting 
the schema that the data was actually written with, the datum reader 
will know that it isn't present and will use your default instead. When 
you read data encoded with the new schema, you need to use it as the 
written schema instead so the datum reader knows that the field should 
be read.

Does that make sense?

rb

On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
> down votefavorite
> <http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>
>
> I have this avro schema
>
> {
>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>   "type": "record",
>   "name": "MyPayLoad",
>   "fields": [
>       {"name": "filed1",  "type": "string"},
>       {"name": "filed2",     "type": "long"},
>       {"name": "filed3",  "type": "boolean"},
>       {
>            "name" : "metrics",
>            "type":
>            {
>               "type" : "array",
>               "items":
>               {
>                   "name": "MyRecord",
>                   "type": "record",
>                   "fields" :
>                       [
>                         {"name": "min", "type": "long"},
>                         {"name": "max", "type": "long"},
>                         {"name": "sum", "type": "long"},
>                         {"name": "count", "type": "long"}
>                       ]
>               }
>            }
>       }
>    ]}
>
> Here is the code which we use to parse the data
>
> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>          DatumReader<MyPayLoad> payloadReader = new
> SpecificDatumReader<>(MyPayLoad.class);
>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
>          MyPayLoad myPayLoad = null;
>          try {
>              myPayLoad = payloadReader.read(null, decoder);
>          } catch (IOException e) {
>              logger.log(Level.SEVERE, e.getMessage(), e);
>          }
>
>          return myPayLoad;
>      }
>
> Now i want to add one more field int the schema so the schema looks like
> below
>
>   {
>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>   "type": "record",
>   "name": "MyPayLoad",
>   "fields": [
>       {"name": "filed1",  "type": "string"},
>       {"name": "filed2",     "type": "long"},
>       {"name": "filed3",  "type": "boolean"},
>       {
>            "name" : "metrics",
>            "type":
>            {
>               "type" : "array",
>               "items":
>               {
>                   "name": "MyRecord",
>                   "type": "record",
>                   "fields" :
>                       [
>                         {"name": "min", "type": "long"},
>                         {"name": "max", "type": "long"},
>                         {"name": "sum", "type": "long"},
>                         {"name": "count", "type": "long"}
>                       ]
>               }
>            }
>       }
>       {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
>    ]}
>
> Note the filed added and also the default is defined. The problem is that
> if we receive the data which was written using the older schema i get this
> error
>
> java.io.EOFException: null
>      at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
> ~[avro-1.7.4.jar:1.7.4]
>      at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
> ~[blitz-shared.jar:na]
>
> What i understood from this
> <https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html>
> document
> that this should have been backward compatible but somehow that doesn't
> seem to be the case. Any idea what i am doing wrong?
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Mime
View raw message