cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8895) Compressed sstables should only compress if the win is above a certain threshold, and should use a variable block size
Date Thu, 16 Jul 2015 15:17:09 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14629872#comment-14629872
] 

Benedict commented on CASSANDRA-8895:
-------------------------------------

The basic idea of this is to compress using the smallest buffer size we can, so that we waste
no IOPs (or cycles) answering reads. This is especially important for small partitions.

To give a rough outline of how I think this should work: the idea would be to buffer up to,
say, 1Mb for any sstable we write (instead of the current 64Kb), at least for the first Mb,
or perhaps for the first 10Mb (iteratively), or perhaps once every 50Mb, for 1Mb. These details
aren't very important and can be tweaked later.

We introduce a configuration parameter that specifies how much compression needs to achieve
to be worth pursuing, and a ratio of improvement to warrant a larger block size (let's say,
we require that we get 15% better compression to warrant twice the block size)

Once we have this and our larger-than-normal buffer, we essentially perform binary search
to find out optimal chunk size. 
* We start with _no_ compression, and 64Kb chunk size with compression. 
* If 64Kb is not above our minimum compression ratio, we use no compression. 
* If it is, we try the mid-point _of the logarithmic scale_ (i.e. if 4Kb is our minimum chunk
size, we have possible sizes of 4, 8, 16, 32, 64, so our midpoint would be 8 or 16)
** If that is a better choice based on our parameters, we then try the mid-point of that and
off (i.e. 4 or 8), etc
** If not, we try the midpoint above (i.e. 16 or 32), etc
* We then use this chunk size for the contents of the buffer _and all proceeding writes_

We should most likely short-circuit this if on compaction we estimate a single partition to
be larger than 64Kb, or we should set our lower-bound of chunk size to whatever size our partitions
are estimated to be.

There are a lot of implementation decisions to be made, such as if the no-compression route
pretends to be a compressed file and the compressed reader just understands that, or if we
make a decision earlier (both have their unpleasant aspects). If we just make a single decision
up-front, or continually reassess the decision.

> Compressed sstables should only compress if the win is above a certain threshold, and
should use a variable block size
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8895
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8895
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Paulo Motta
>              Labels: performance
>             Fix For: 3.x
>
>
> On performing a flush to disk, we should assess if the data we're flushing will actually
be substantively compressed, and how large the page should be to get optimal compression ratio
versus read latency. Decompressing 64Kb chunks is wasteful when reading small records.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message