cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terje Marthinussen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-47) SSTable compression
Date Fri, 01 Apr 2011 12:14:07 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014531#comment-13014531
] 

Terje Marthinussen commented on CASSANDRA-47:
---------------------------------------------

This is not so interesting for a "proper" solution maybe, but adding just for the reference.

I needed to get space for more data, so I recently just crashed into a quick compression hack
for supercolumns.

I was considering to compress the index blocks as Jonathan suggested, but I could not make
up my mind on how safe that would be in terms of other code accessing the data and had a bit
short time, so I looked for something more isolated.

Final decision was to simply compress the serialized columns in a supercolumn (+ add a bit
caching to avoid recompressing all the time when serialized size is requested)

The data I have, has supercolumns with typically 50-60 subcolumns. Mostly small strings or
numbers. 
In total, the subcolumns makes up 600-1200 bytes of data when serialized.

Usually a fair bit of supercolumns per row.

My test data was 447 keys. I tested with the ning lzf jars and the default java.util.zip.
This is not necessarily a good test, but I think json2sstable is somewhat useful to measure
relative impact between implementations although not useful to determine real performance
in any way.

In addition, I made a simple dictionary of column names (only applied to supercolumns) as
the column names was not very well compressed when looking at just a single supercolumn at
a time.

The result of both the digest and compression:
Standard cassandra. json2sstable:
real	0m55.148s
user	1m50.023s
sys	0m2.856s
sstable: 190MB

ning.com:
real	1m8.315s
user	2m18.361s
sys	0m4.600s
sstable: 108MB

java.util.zip
real	1m35.899s
user	2m49.691s
sys	0m2.940s
sstable: 90mb

As a reference, the whole sstable files compresses as follows:
ning.com (command line)
real	0m1.803s
user	0m1.536s
sys	0m0.396s
sstable: 80MB

gzip (command line)
real	0m6.175s
user	0m6.076s
sys	0m0.084s
sstable: 48MB


I doubt this implementation has much for inclusion in a release. Just added the numbers for
the reference.
Of course, if requested, I could see if I could make the patch available somewhere.

> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Priority: Minor
>             Fix For: 0.8
>
>
> We should be able to do SSTable compression which would trade CPU for I/O (almost always
a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message