avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Meyer, Dennis" <dennis.me...@adtech.com>
Subject Re: BigInt / longlong
Date Thu, 29 Mar 2012 07:20:50 GMT

That's not the best idea as it's wasting a lot of space as encoding eats
up lots of space (e.g. 1Byte ASCII, 2-3Byte for UTF-8). Especially as AVRO
uses the MSB for compressing smaller ints, this does not seem very keen
for mass data.

I'll see if 64Bit unsigned -> 64Bit signed conversion or using the matisse
of double works better for us.


Am 29.03.12 01:38 schrieb "Miki Tebeka" unter <miki.tebeka@gmail.com>:

>I would encode to string. Should be simple enough, just means you need
>a pass on the data after reading it.
>On Wed, Mar 28, 2012 at 11:43 AM, Scott Carey <scottcarey@apache.org>
>> On 3/28/12 11:01 AM, "Meyer, Dennis" <dennis.meyer@adtech.com> wrote:
>> Hi,
>> What type refers to an Java Bigint or C long long? Or is there any other
>> type in Avro that maps a 64 bit unsigned int?
>> I unfortunately could only find smaller types in the docs:
>> Primitive Types
>> The set of primitive type names is:
>> string: unicode character sequence
>> bytes: sequence of 8-bit bytes
>> int: 32-bit signed integer
>> long: 64-bit signed integer
>> float: single precision (32-bit) IEEE 754 floating-point number
>> double: double precision (64-bit) IEEE 754 floating-point number
>> boolean: a binary value
>> null: no value
>> Anyway in the encoding section theres some 64bit unsigned. Can I use
>> somehow by a type?
>> An unsigned value fits in a signed one.  They are both 64 bits.  Each
>> language that supports a long unsigned type has its own way to convert
>> one to the other without loss of data.
>> Work around might be to use the 52 significant bits of a double, but
>> like a hack and of course loosing some more number space compared to
>> I'd like to get around any other self-encoding hacks as I'd like to
>>also use
>> Hadoop/PIG/HIVE on top on AVRO, so would like to keep functionality on
>> numbers if possible.
>> Java does not have an unsigned 64 bit type.  Hadoop/Pig/Hive all only
>> signed 64 bit integer quantities.
>> Luckily, multiplication and addition on two's compliment signed values
>> identical to the operations on unsigned ints, so for many operations
>> is no loss in fidelity as long as you pass the raw bits on to something
>> interprets the number as an unsigned quantity.
>> That is, if you take the raw bits of a set of unsigned 64 bit numbers,
>> treat those bits as if they are a signed 64 bit quantities, then do
>> addition, subtraction, and multiplication on them, then treat the raw
>> result as an unsigned 64 bit value, it is as if you did the whole thing
>> unsigned.
>> http://en.wikipedia.org/wiki/Two%27s_complement
>> Avro only has signed 32 and 64 bit integer quantities because they can
>> mapped to unsigned ones in most cases without a problem and many
>> most) languages do not support unsigned integers.
>> If you want various precision quantities you can use an Avro Fixed type
>> map to any type you choose.  For example you can use a 16 byte fixed to
>> to 128 bit unsigned ints.
>> Thanks,
>> Dennis

View raw message