cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Burroughs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-47) SSTable compression
Date Sat, 09 Jul 2011 13:41:17 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062379#comment-13062379
] 

Chris Burroughs commented on CASSANDRA-47:
------------------------------------------

.bq Using 64kb buffer 1.7GB file could be compressed into 110MB (data added using ./bin/stress
-n 1000000 -S 1024 -V, where -V option generates average size values and different cardinality
from 50 (default) to 250).

This seems like an unrealistically good compression ratio.  If I gzip a real world SSTable
that has redundant data that should be ripe for compression I only see 641M-->217M.  What's
the gzip compression ratio with the SSTables that stress.java workload generates?  

Stu, could you post your custom YCSB workload from CASSANDRA-674 for comparison?

> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O (almost always
a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message