avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Schema validation of a field's default values
Date Mon, 05 Nov 2012 18:46:27 GMT

I'd welcome improvements to default value validation in Avro.  For
performance, I think this should be an explicit, separate operation from
parsing schemas.  But we might invoke it on schemas at various points,
e.g., when creating a file.  If you are able, please contribute your
implementation by filing an issue in Avro's Jira.



On Sat, Nov 3, 2012 at 9:48 AM, Mark Hayes <mark@greybird.com> wrote:

> On Mon, Oct 29, 2012 at 12:32 PM, Doug Cutting <cutting@apache.org> wrote:
>> No, I don't know of a default value validator that's been implemented
>> yet.  It would be great to have one.
>> I think this would recursively walk a schema.  Whenever a non-null
>> default value is found it could call ResolvingGrammarDecoder#encode().
>>  That's what interprets Json default values.  (Perhaps this logic
>> should be moved, though.)
> Thanks for the reply Doug.
> I did find ResolvingGrammarDecoder.encode (I saw that it is called by the
> builders) and was using it as you described, but I ran into limitations:
> +  When the field type is an array, map or record, values of the
> wrong JSON type (not array or object) are translated to an empty array,
> map or record.  For example, specifying a default of 0, null or "" results
> in an empty array, map or record.
> + For all numeric Avro types (int, long, float and double) the default
> value may be of any JSON numeric type, and the JSON values will be coerced
> to the Avro type in spite of the fact that part of the value may be
> lost/truncated.  For example, a long default value that exceeds 32-bits
> will be truncated if the field is type int.
> + The byte array length is not validated for a fixed type.
> + For nested fields and certain types (e.g., enums) a cryptic error
> is often output that does not contain the name of the offending field.
> These deficiencies can mask errors made by the user when defining
> a default value.  This is important to our application.
> To compensate for these deficiencies we implemented our own checking that
> is more strict than Avro's.  To do this, we serialize the default value
> using our own JSON serializer in a special mode where default values are
> applied.  Any errors during serialization indicate that the default value
> is invalid.
> Something similar might be done in Avro itself, for example, if the JSON
> encoder were made to operate in a special mode where default values are
> applied.
> --mark

View raw message