cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9894) Serialize the header only once per message
Date Wed, 29 Jul 2015 22:28:04 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646840#comment-14646840
] 

Benedict edited comment on CASSANDRA-9894 at 7/29/15 10:27 PM:
---------------------------------------------------------------

I've pushed an initial version [here|https://github.com/belliottsmith/cassandra/tree/9894].
This is based on my patch for CASSANDRA-9471.

I tried starting from Sylvain's patch, and then starting from scratch, and ultimately I didn't
like where either lead. So this attacks the problem a little differently: it uses the column
filter sent to the replica to help encode the response, knowing that the response columns
must be a subset. With a normal number of columns this translates to a presence bitmap (otherwise
it is a sequence of ints either adding or removing from the set, but these codepaths should
rarely be taken). If the columns are identical, a single 0 byte is sent for all the columns.

This permits us to save work when serializing even single partitions, and also permits us
to encode per-partition encoding stats, so that our timestamps can most likely be more efficiently
encoded. It also touches far less code.

I am not 100% certain I haven't broken things, as dtests are a little tricky to read right
now, but nothing jumps out at me. I still need to introduce some unit tests, and also want
to invert the bitmap to make it more efficiently vint encoded. But the patch is generally
ready for a first round of review, as it will change only minimally.


was (Author: benedict):
I've pushed an initial version [here|https://github.com/belliottsmith/cassandra/tree/9894].
This is based on my patch for CASSANDRA-9471.

I tried starting from Sylvain's patch, and then starting from scratch, and ultimately I didn't
like where either lead. So this attacks the problem a little differently: it uses the column
filter sent to the coordinator for a query to encode the response, knowing that the columns
must be a subset. With a normal number of columns this translates to a bitmap of presence
in the response for each column in the request (otherwise it is a sequence of vint encoded
ints, but these codepaths should rarely be taken), and if the columns are identical (what
we should expect), a single 0 byte is sent for all the columns.

This permits us to save work when serializing even single partitions, and also permits us
to encode per-partition encoding stats, so that our timestamps can most likely be more efficiently
encoded. It also touches far less code.

I am not 100% certain I haven't broken things, as dtests are a little tricky to read right
now, but nothing jumps out at me. I still need to introduce some unit tests, and also want
to invert the bitmap to make it more efficiently vint encoded. But the patch is generally
ready for a first round of review, as it will change only minimally.

> Serialize the header only once per message
> ------------------------------------------
>
>                 Key: CASSANDRA-9894
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9894
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Benedict
>             Fix For: 3.0 beta 1
>
>
> One last improvement I'd like to do on the serialization side is that we currently serialize
the {{SerializationHeader}} for each partition. That header contains the serialized columns
in particular and for range queries, serializing that for every partition is wasted (note
that it's only a problem for the messaging protocol as for sstable we only write the header
once per sstable).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message