cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Anastasyev (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7994) Commit logs on the fly compression
Date Wed, 24 Sep 2014 18:41:34 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146252#comment-14146252
] 

Oleg Anastasyev edited comment on CASSANDRA-7994 at 9/24/14 6:40 PM:
---------------------------------------------------------------------

Well, functionally they are alike. Some diffs with current code in (https://github.com/blambov/cassandra/compare/compressed-cl)
are:
1. 6809 creates whole compressed CL segment buffer on java heap, wastes java heap memory keeping
uncompressed mutation bytes on heap. 7994 keeps it offheap, allocating only to-compress 64k
buffer on heap. 
2. I suggest 6809 creates commit log segment files of non uniform size (b/c it interprets
segment size in uncompressed bytes), adding to filesystem fragmentation. 7994 creates segments
files of uniform size, as configured in DD; it limits compressed file size.
3. 6809 complicates CommitLogReplayer further. 7994 refactors mutations reading logic to (Compressed)CommitLogReaders
4. 7994 uses XXHash32 for checksumming, which is 5x faster (from lz4 lib)
5. 6809 adds support for several compression algos, 7994 makes use of lz4 only. Is allowing
to configure any classname as compressor of commit logs makes practical sence? 

and, obviously, 7994 is implemented on 2.0, and 6809 is planned for 3.0 



was (Author: m0nstermind):
Well, functionally they are alike. Some diffs are
1. 6809 creates whole compressed CL segment buffer on heap and does not reuses them. if someone
has large segment size, this would create unneccessary heap stress. 7994 keeps it offheap,
allocating only to-compress 64k buffer on heap. 
2. 6809 complicates CommitLogReplayer further. 7994 refactors mutations reading logic to (Compressed)CommitLogReaders
3. 7994 uses XXHash32 for checksumming, which is 5x faster (from lz4 lib)
4. 6809 adds support for several compression algos, 7994 makes use of lz4 only. Is allowing
to configure any classname as compressor of commit logs makes practical sence? 

and, obviously, 7994 is implemented on 2.0, and 6809 is planned for 3.0 


> Commit logs on the fly compression 
> -----------------------------------
>
>                 Key: CASSANDRA-7994
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7994
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Oleg Anastasyev
>         Attachments: CompressedCommitLogs-7994.txt
>
>
> This patch employs lz4 algo to comress commit logs. This could be useful to conserve
disk space either archiving commit logs  for a long time or for conserviing iops for use cases
with often and large mutations updating the same record.
> The compression is performed on blocks of 64k, for better cross mutation compression.
CRC is computed on each 64k block, unlike original code computing it on each individual mutation.
> On one of our real production cluster this saved 2/3 of the space consumed by commit
logs. The replay is 20-30% slower for the same number of mutations.
> While doing this, also refactored commit log reading code to CommitLogReader class, which
i believe makes code cleaner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message