avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (AVRO-49) Efficient positive varints
Date Tue, 23 Jun 2009 20:41:07 GMT

    [ https://issues.apache.org/jira/browse/AVRO-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723297#action_12723297
] 

Scott Carey commented on AVRO-49:
---------------------------------

On average, this will waste one bit per int if all numbers are positive.  Performance should
be slightly lower as well due to the extra bit shift (perhaps imperceptibly so). 

I do find that "id" types are extremely common, although a third common type is one that is
positive for almost all values, and occasionally contains small negative numbers (most often,
-1).

However perhaps the most common case will be the length fields in all "string" and "bytes"
fields, which by definition are always positive or zero, but are encoded as if they could
be negative.

Although not optimal, a client can 'pack' their positive number data into the current format
if space is a huge concern:  If even, bit shift to the right one and set the sign bit indicating
negative on write, and on read multiply negative numbers by two and flip the sign.  
The length fields of strings and bytes can even do this -- although its somewhat of a waste
to rotate the int one way in the client and then back again in the low level encode.  

Perhaps the best thing is to do the reverse and pack unsigned internally, and have the "int"
and "long" types pack the sign bit?


The other variable length type I can think of is a Numeric data of arbitrary, exact precision
 (Think BigDecimal)

It is possible to pack something like that into the bytes type though.

> Efficient positive varints
> --------------------------
>
>                 Key: AVRO-49
>                 URL: https://issues.apache.org/jira/browse/AVRO-49
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>            Reporter: Ismael Juma
>
> Hi,
> Avro looks like an interesting project, so I was looking at the specification. I noticed
that varints use zig-zag encoding which is a good signed representation. However, id-like
values are very common and they are often always positive. I think it would be nice to support
them as efficiently as possible too (like Protocol Buffers, for example). Would something
like this be considered at all?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message