avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <scottca...@apache.org>
Subject Re: Can mismatched schemas cause the Avro code to crash
Date Mon, 29 Aug 2011 16:30:06 GMT

On 8/28/11 11:29 AM, "W.P. McNeill" <billmcn@gmail.com> wrote:

>I'm debugging a nasty problem the occurs down in the Avro 1.4.1 code.
>Sometimes when I read my serialized data into a generic datum object I
>crash deep inside the Avro code. The call stack shows that the parser has
>been walking down my data structure until it gets to a string node that
>it tries to read using BinaryDecoder.readString. This function retrieves
>an invalid string length (e.g. a negative number) and the process
>subsequently crashes with an Array Index Out Of Bounds exception.
>The exact origin of this bug is mysterious to me, but at a high level it
>appears the problem is that I wrote the data with one schema and
>mistakenly read it back in using a different schema. How exactly this
>happened is also mysterious, but appears that my mechanism for supporting
>projection schemas didn't behave as it should have. The two schemas in
>question are mostly the sameĀ­in fact, one is a subset of the other.
>1. In general is it possible for a schema-to-data mismatch to cause a
>crash down in the Avro code of the sort that I described?
Yes, if the schema indicates it should read a String, but the data is
actually something else, an exception may be thrown.

>2. If the answer to question (1) is "yes", is the only way you'd expect
>the crash to happen is if writing was done with the superset schema and
>reading done with the subset schema?

Any condition where data is read by an invalid schema may have this
happen, or if the raw binary data is corrupt.

>3. Writing with a superset schema and reading with a subset schema will
>always work because this is just projection, correct?

As long as the reader is configured to resolve the schemas properly, it
should work.  There have been a couple bugs in this use case over time
however.  What is the type of the last field of your written data?
For example, 1.5.1 fixed an issue in projected schemas in a very specific
situation: https://issues.apache.org/jira/browse/AVRO-793


View raw message