cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terje Marthinussen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-47) SSTable compression
Date Wed, 27 Jul 2011 04:23:10 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071504#comment-13071504
] 

Terje Marthinussen commented on CASSANDRA-47:
---------------------------------------------

Instead of on/off we could use size.

In the cassandra we run, we have compression implemented on a supercolumn level.

It turned out to be very good for performance for us not to compress data in memtables (which
we would normally do with compression on supercolumns) or during flushing from memtables as
both of these caused slowdown in the write path.

Under heavy write activity, the resulting sstables from memtable flushes often gets pretty
small (maybe avg. 20MB in our case) so compression does not really make much difference on
disk consumption there, but the performance penalty does.

All the compression/decompression on compacting the smallest tables also makes a noticable
difference when trying to keep up on the compaction side.

Instead we went for compression which only happens when a source sstable during compaction
is larger than 4GB. 

I would recommend to consider similar functionality here.

I started off with ning for our compression, but I now run the built in java deflate to get
even better compression. Since we only compress the largest sstables, and do no other compression
in the write path or on compaction of small sstables,the very slow compression of deflate
does not bother us that much. 

The read side is of course still slower with inflate, but it is still more than fast enough
to not be a problem. 

OS caching will also be better thanks to the better compression so we can regain some of the
performance lost vs. ning/snappy there.

We could also consider being very tunable with deflate for very large sstables, ning/snappy
 for smaller and no compression for the smallest, but I am not sure it is worth it.

By the way, how much difference did you see on ning vs. snappy? When I tested it was not all
that much difference and I felt ning was easier to bundle so to me it seemed like a better
alternative.

> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-47-v2.patch, CASSANDRA-47-v3-rebased.patch, CASSANDRA-47-v3.patch,
CASSANDRA-47-v4.patch, CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O (almost always
a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message