cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Yaskevich (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-47) SSTable compression
Date Sat, 09 Jul 2011 13:53:17 GMT


Pavel Yaskevich commented on CASSANDRA-47:

bq. This seems like an unrealistically good compression ratio. If I gzip a real world SSTable
that has redundant data that should be ripe for compression I only see 641M-->217M. What's
the gzip compression ratio with the SSTables that workload generates?

You can easily test it yourself: for example ./bin/stress -S 1024 -n 1000000 -C 250 -V wait
for compactions to finish and check block size of the resulting files (using ls -lahs), I
see 3.8GB compressed into 781MB in my tests. internal_op_rate with the current trunk code
is around 450-500 but with current patch it is about 2800-3000 on Quad-Core AMD Opteron(tm)
Processor 2374 HE 4229730MHz on each core, 2GB mem (rackspace instance). cardinality of 250
is 5 times bigger that default + average size values using -V option. 

> SSTable compression
> -------------------
>                 Key: CASSANDRA-47
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>         Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
> We should be able to do SSTable compression which would trade CPU for I/O (almost always
a good trade).

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message