avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Larroy <pedro.larroy.li...@gmail.com>
Subject Re: unsigned types
Date Wed, 11 Dec 2013 17:37:24 GMT
I like the idea about the "fixed" type, but it's not only about the number
range but also a matter of type correctness on decoding.
In C and C++ is common to use unsigned types for data which will always
have positive values might indeed use the full range. Then one would like
this type to be properly decoded in other languages. I'm more interested in
getting the data in the correct type in Python (int) than getting a "bytes"
object that I have to manually convert to an integer.


In my case in C++ i'm using the following hack to encode uint64 types:

template<>

struct codec_traits<uint64> {

    static void encode(Encoder &e, uint64 x)

    {

        avro::encode(e, static_cast<int64>(x));

    }



    static void decode(Decoder &d, uint64& x)

    {

        int64 r = 0;

        avro::decode(d, r);

        x = static_cast<uint64>(r);

    }

};

Which again, I think it's just not practical that one encodes
0xFFFFFFFFFFFFFFFFULL  in C++ and then reads -1 in other languages, you
need to build some wrapping functions around it and remember that some
fields are actually unsigned which breaks all the advantages of having a
schema.

>From the languages avro supports, C, C++ and C# have unsigned types. Python
has arbitrarily long integers, so it's not an issue.

If you think adding unsigned types is not a good idea, how would you solve
the previous problem that I stated in a matter that is convenient to read
from another languages. A bunch of bytes doesn't have the same semantics as
an unsigned integer. I think it would be good to have avro as a generic
serialization format not only limited by jvm implementation details.

Thanks.

Pedro.





On Wed, Dec 11, 2013 at 6:16 PM, Martin Kleppmann <martin@rapportive.com>wrote:

> Personally, I think it's a good design decision that Avro doesn't support
> unsigned types.
>
> Whether you use signed or unsigned only makes a difference if you expect to
> have numbers between 2^63 and 2^64-1 (if you have numbers between 2^31 and
> 2^32-1 you can use the Avro 'long' type instead of the 'int' type). And if
> your numbers are indeed between 2^63 and 2^64-1, you're better off using a
> 'fixed' type, which will only use 8 bytes, rather than a 'long' which would
> use 10 bytes for such a large number, due to the variable-length encoding.
>
> Another problem with unsigned types can be seen in Protocol Buffers (which
> supports both signed and unsigned): if you do accidentally put -1 in a
> field with an unsigned type, the resulting encoding is ten bytes long — a
> surprising and unnecessary gotcha. (
> https://developers.google.com/protocol-buffers/docs/encoding#types)
>
> Interested to hear other opinions on the matter!
>
> Martin
>
>
> On 11 December 2013 12:38, Pedro Larroy <pedro.larroy.lists@gmail.com
> >wrote:
>
> > Hi
> >
> > Is there any reason except the java centric focus of avro that it
> shouldn't
> > support unsigned types? We use them extensively and I'm thinking for us*
> it
> > would be useful to have them as we use mostly C++ <-> python
> communication
> > with avro.
> >
> > Would this be accepted in the official avro distribution?
> >
> > Pedro.
> >
> >
> > *us: Here, a Nokia business.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message