cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-4139) Add varint encoding to Messaging service
Date Wed, 10 Dec 2014 20:20:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241644#comment-14241644
] 

Ariel Weisberg edited comment on CASSANDRA-4139 at 12/10/14 8:19 PM:
---------------------------------------------------------------------

Is bandwidth a constraint for WAN replication? In practice is the default for messaging to
have compression on? What are people doing in the wild?

I could imagine varint encoding being a win for Cells where the names and values are integers
and queries are bulk loading or selecting ranges. At the storage level it seems like the kind
of thing that could beat general purpose compression if you know what data type you are dealing
with and have a lot of 0 padded values.

I have heard talk about using a column store and run length encoding approach for storage
which makes it seem like varint encoding wouldn't be the tool of choice for storage either.

The code changes don't look bad. It's mostly swapping types for streams and changes to calculating
serialized size so that it is aware of the impact of variable length encoded integers. It
could save bandwidth, but it could also be slower since you spend more cycles calculating
serialized size and encoding/decoding integers. If you end up using compression in bandwidth
sensitive scenarios you may not win much.

Not varint encoding the data going in/out of the database means you only save real space proportionally
when you have small operations going in/out. The flip side is that you can't do that many
small ops anyways so you aren't bandwidth constrained.


was (Author: aweisberg):
Is bandwidth a constraint for WAN replication? In practice is the default for messaging to
have compression on? What are people doing in the wild?

I could imagine varint encoding being a win for Cells where the names and values are integers
and queries are bulk loading or selecting ranges. At the storage level it seems like the kind
of thing that could beat general purpose compression if you know what data type you are dealing
with and have a lot of 0 padded values.

I have heard talk about using a column store and run length encoding approach for storage
which makes it seem like varint encoding would be the tool of choice for storage either.

The code changes don't look bad. It's mostly swapping types for streams and changes to calculating
serialized size so that it is aware of the impact of variable length encoded integers. It
could save bandwidth, but it could also be slower since you spend more cycles calculating
serialized size and encoding/decoding integers. If you end up using compression in bandwidth
sensitive scenarios you may not win much.

Not varint encoding the data going in/out of the database means you only save real space proportionally
when you have small operations going in/out. The flip side is that you can't do that many
small ops anyways so you aren't bandwidth constrained.

> Add varint encoding to Messaging service
> ----------------------------------------
>
>                 Key: CASSANDRA-4139
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4139
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Vijay
>            Assignee: Ariel Weisberg
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-4139-v1.patch, 0001-CASSANDRA-4139-v2.patch, 0001-CASSANDRA-4139-v4.patch,
0002-add-bytes-written-metric.patch, 4139-Test.rtf, ASF.LICENSE.NOT.GRANTED--0001-CASSANDRA-4139-v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message