cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9499) Introduce writeVInt method to DataOutputStreamPlus
Date Wed, 17 Jun 2015 08:48:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589470#comment-14589470
] 

Benedict edited comment on CASSANDRA-9499 at 6/17/15 8:47 AM:
--------------------------------------------------------------

I'm confused as to why we need 10 bytes? Pretty much by definition a continuation bit encoding
needs 9 bytes to represent 8 bytes. Let's not use Google's implementation. It looks pretty
rubbish. 

The reason they use 10 bytes is they cannot be bothered to realise the last byte does not
need a continuation bit. They also use a *terrible* implementation for deciding how long it
needs to be.

Here's  a simple change which makes it 9 bytes, and easily optimised: the continuation bits
are all shifted to the first byte, which effectively encodes the length in run-length encoding,
i.e. the number of contiguous top order bits that are set to 1. i.e. {{length = Integer.numberOfLeadingZeros(firstByte
^ 0xffff)}}

This way we read the first byte, and if there are any more to read, we read a long, and quickly
truncate.


was (Author: benedict):
I'm confused as to why we need 10 bytes? Pretty much by definition a continuation bit encoding
needs 9 bytes to represent 8 bytes. Let's not use Google's implementation. It looks pretty
rubbish. 

The reason they use 10 bytes is they cannot be bothered to realise the last byte does not
need a continuation bit. They also use a *terrible* implementation for deciding how long it
needs to be.

Here's  a simple change which makes it 9 bytes, and easily optimised: the continuation bits
are all shifted to the first byte, which effectively encodes the length in run-length encoding,
i.e. the number of contiguous top order bits that are set to 1. i.e. {{length = Integer.numberOfLeadingZeros(firstByte
^ (byte) -1)}}

This way we read the first byte, and if there are any more to read, we read a long, and quickly
truncate.

> Introduce writeVInt method to DataOutputStreamPlus
> --------------------------------------------------
>
>                 Key: CASSANDRA-9499
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9499
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Ariel Weisberg
>            Priority: Minor
>             Fix For: 3.0 beta 1
>
>
> CASSANDRA-8099 really could do with a writeVInt method, for both fixing CASSANDRA-9498
but also efficiently encoding timestamp/deletion deltas. It should be possible to make an
especially efficient implementation against BufferedDataOutputStreamPlus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message