avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Karp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AVRO-1470) Perl API boolean type misencoded
Date Mon, 17 Mar 2014 16:07:47 GMT

    [ https://issues.apache.org/jira/browse/AVRO-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937960#comment-13937960

John Karp commented on AVRO-1470:

Since everything is evaluable to true or false in Perl, including undef, I think that might
be problematic. If you had a union of [boolean, null], there's no value you can specify to
get the null branch. If you have a union of [boolean, string], any string specified will be
encoded as a true boolean, except for the empty string which would be false. And if someone
is encoding a populated hash or array as a boolean type, there's a good chance its a mistake,
and it would be more useful to produce an error than to encode the whole thing as a true value.

Another approach would be to use '1' and the empty string as true and false, since they're
the closest thing in Perl to canonical boolean values. ('' == !!0 and '1' == !!1). However,
deserializing false as the empty string might be confusing to some users.

Another approach that avoids all the above problems would be to require a special class for
representing booleans, for example:
The downsides are that it would be another dependency, and you couldn't directly pass the
result of a boolean test into the serializer without having to wrap it first.

> Perl API boolean type misencoded
> --------------------------------
>                 Key: AVRO-1470
>                 URL: https://issues.apache.org/jira/browse/AVRO-1470
>             Project: Avro
>          Issue Type: Bug
>          Components: perl
>            Reporter: John Karp
>            Assignee: John Karp
>         Attachments: AVRO-1470.patch
> h1. Boolean Serialization
> The boolean serialization code in BinaryEncoder.pm is:
> {noformat}
> $data ? \0x1 : \0x0
> {noformat}
> intending that anything false to perl, such as 0, '0', '', () and undef are encoded as
zero, and everything else is encoded as one. However, this code doesn't work, as these unit
tests would indicate:
> {noformat}
> primitive_ok boolean => 0, "\x0";
> primitive_ok boolean => 1, "\x1";
> {noformat}
> which print:
> {noformat}
> #   Failed test 'primitive boolean encoded correctly'
> #   at t/02_bin_encode.t line 40.
> #          got: '30'
> #     expected: '00'
> #   Failed test 'primitive boolean encoded correctly'
> #   at t/02_bin_encode.t line 40.
> #          got: '31'
> #     expected: '01'
> {noformat}
> h1. Booleans in Unions
> Inconsistent with the above serialization, the code used in Schema.pm to determine which
union branch to use, is attempting to check for boolean-ness with:
> {noformat}
> m{yes|no|y|n|t|f|true|false}i
> {noformat}
> meaning only those particular strings are considered booleans, however they will all
get encoded as '0' by BinaryEncoder.pm.
> I say 'attempts' because its actually matching this regex against the data type name
$type, which in this context will always be 'boolean', instead of of the value $data.
> h1. Suggested Fix
> Perl has no boolean type, so there's no ideal solution for the inconsistency. But we
could keep it simple, and have only the numbers 0 and 1 accepted as boolean values.

This message was sent by Atlassian JIRA

View raw message