cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10232) Small optimizations in index entry serialization
Date Mon, 31 Aug 2015 16:35:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14723648#comment-14723648
] 

Sylvain Lebresne commented on CASSANDRA-10232:
----------------------------------------------

Pushed [a branch|https://github.com/pcmanus/cassandra/commits/10232] that implements the following
changes:
# The 1st commit uses vint encoding for index entries. This should give quite a few benefits
in itself since we store things like each index entry width as a 8 byte long while t will
almost always is around 64k. Or we use a 4 byte int for the number of entries which is likely
often smallish.
# The 2nd commit get rid of the offset in each index entry. Indeed, keep both the offset (from
the row start) and the width of each entry is a bit redundant: we can recompute one with the
other. And since the width will yield a better vint encoding, the patch removes the offset.
We do need to add for each indexed partition the size of the "partition header", but that's
a single (small) value for each partition and since we don't index unless we have at least
2 block, this will always be a net win.
# The 3rd commit is not a serialization improvement but just a minor cleanup that avoid re-creating
serializer objects every time we need them.
# The 4th and last commit is a small improvement over the 1st one: it uses 64k as a base to
delta-encode each entry width (since by definition each entry will be just slightly bigger
than 64k). The patch actually hard-code 64k even though users can theoretically change the
index size, but that's because I didn't saw a trivial way to save the actual index size used
alongside each index file and I want to keep the patch on this ticket simple enough to write/review
so they can make it in 3.0. Happy to skip that commit if someone has an allergic reaction
to the hard-coded number however.

All those changes should be simple and quick to review so hopefully we can get those in 3.0
quickly. At the very very least, the 1st commit is trivial and there is no reason not to include
it imo.

> Small optimizations in index entry serialization
> ------------------------------------------------
>
>                 Key: CASSANDRA-10232
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10232
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0.0 rc1
>
>
> While we should improve the data structure we use for our on-disk index in future versions,
it occurred to me that we had a few _very_ low hanging fruit optimization (as in, for 3.0)
we could do for the serialization of our current entries, like using vint encodings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message