cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
Date Tue, 16 Jun 2015 11:42:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587908#comment-14587908
] 

Benedict commented on CASSANDRA-9499:
-------------------------------------

bq. I assumed that knowing a given integer can't be negative could lead to more efficient
encoding

Well, we could have two different encodings, but with the current scheme this would mostly
help values in the range of [128..250). I'm not sure if that's worth confusing everything
for.

However if we change the encoding, we can bias towards positive encodings, since they're more
common. I'm somewhat inclined to use a hybrid extending bits scheme. A starting suggestion:

* first byte: 2bits of length; followed by, if any of the first bits are set, 1 sign bit;
followed by, if all length bits are set, 2 more bits of length; the remainder (3-6 bits) are
value bits
* all remaining bytes contain value bits only

This would lead to the following encoding sizes
||value range||suggested scheme size||existing scheme size||
|0..63|1|1|
|64..8K|2|mostly 3, 64..128=1, 128..256=2|
|-8192..0|2|mostly 3, -112..0=1, -256..-112=1|
|8K..2M|3|mostly 4|

etc.

So, we basically lose out a small amount for values in the range 64..128, and -256..-1. Everything
else we gain. If we wanted to further bias towards positive encoding, we could require that
at least one sign bit is present for the signbit to be present, so that negative numbers cannot
be encoded in less than 3 bytes.

> Introduce writeVInt method to DataOutputStreamPlus
> --------------------------------------------------
>
>                 Key: CASSANDRA-9499
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>            Priority: Minor
>             Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing CASSANDRA-9498
but also efficiently encoding timestamp/deletion deltas. It should be possible to make an
especially efficient implementation against BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message