avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Meyer, Dennis" <dennis.me...@adtech.com>
Subject Re: BigInt / longlong
Date Thu, 29 Mar 2012 07:20:50 GMT
Hi,

That's not the best idea as it's wasting a lot of space as encoding eats
up lots of space (e.g. 1Byte ASCII, 2-3Byte for UTF-8). Especially as AVRO
uses the MSB for compressing smaller ints, this does not seem very keen
for mass data.

I'll see if 64Bit unsigned -> 64Bit signed conversion or using the matisse
of double works better for us.

Thanks,
Dennis




Am 29.03.12 01:38 schrieb "Miki Tebeka" unter <miki.tebeka@gmail.com>:

>I would encode to string. Should be simple enough, just means you need
>a pass on the data after reading it.
>
>On Wed, Mar 28, 2012 at 11:43 AM, Scott Carey <scottcarey@apache.org>
>wrote:
>> On 3/28/12 11:01 AM, "Meyer, Dennis" <dennis.meyer@adtech.com> wrote:
>>
>> Hi,
>>
>> What type refers to an Java Bigint or C long long? Or is there any other
>> type in Avro that maps a 64 bit unsigned int?
>>
>> I unfortunately could only find smaller types in the docs:
>>
>> Primitive Types
>>
>> The set of primitive type names is:
>>
>> string: unicode character sequence
>> bytes: sequence of 8-bit bytes
>> int: 32-bit signed integer
>> long: 64-bit signed integer
>> float: single precision (32-bit) IEEE 754 floating-point number
>> double: double precision (64-bit) IEEE 754 floating-point number
>> boolean: a binary value
>> null: no value
>>
>>
>> Anyway in the encoding section theres some 64bit unsigned. Can I use
>>them
>> somehow by a type?
>>
>>
>> An unsigned value fits in a signed one.  They are both 64 bits.  Each
>> language that supports a long unsigned type has its own way to convert
>>from
>> one to the other without loss of data.
>>
>> Work around might be to use the 52 significant bits of a double, but
>>seems
>> like a hack and of course loosing some more number space compared to
>>uint64.
>> I'd like to get around any other self-encoding hacks as I'd like to
>>also use
>> Hadoop/PIG/HIVE on top on AVRO, so would like to keep functionality on
>> numbers if possible.
>>
>>
>> Java does not have an unsigned 64 bit type.  Hadoop/Pig/Hive all only
>>have
>> signed 64 bit integer quantities.
>>
>> Luckily, multiplication and addition on two's compliment signed values
>>is
>> identical to the operations on unsigned ints, so for many operations
>>there
>> is no loss in fidelity as long as you pass the raw bits on to something
>>that
>> interprets the number as an unsigned quantity.
>>
>> That is, if you take the raw bits of a set of unsigned 64 bit numbers,
>>and
>> treat those bits as if they are a signed 64 bit quantities, then do
>> addition, subtraction, and multiplication on them, then treat the raw
>>bit
>> result as an unsigned 64 bit value, it is as if you did the whole thing
>> unsigned.
>>
>> http://en.wikipedia.org/wiki/Two%27s_complement
>>
>> Avro only has signed 32 and 64 bit integer quantities because they can
>>be
>> mapped to unsigned ones in most cases without a problem and many
>>(actually,
>> most) languages do not support unsigned integers.
>>
>> If you want various precision quantities you can use an Avro Fixed type
>>to
>> map to any type you choose.  For example you can use a 16 byte fixed to
>>map
>> to 128 bit unsigned ints.
>>
>>
>> Thanks,
>> Dennis


Mime
View raw message