cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kazuki Ohta (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-47) SSTable compression
Date Tue, 11 May 2010 19:37:43 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866295#action_12866295
] 

Kazuki Ohta commented on CASSANDRA-47:
--------------------------------------

Just a comment. SSTable compression is very useful for storing large web pages. By using order-preserving
hash, we can store the web pages of the same domain, maybe in the same SSTable.

At this time, the vcdiff algorithm (Bentley-McIlroy 99 Scheme) can effectively compress the
longest common strings. Currently, many web pages are constructed by using the same templates,
so this algorithm is able to eliminate the template part and remain only the content part.
I've blogged about this algorithm.

- http://kzk9.net/b/2010/02/vcdiff-data-compression-using-long-common-strings/

 I think this will open up the huge opportunities for cassandra. Even in a single block, this
will work fine. If the compression becomes pluggable, I want to implement this algorithm part.


> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Eric Evans
>            Priority: Minor
>             Fix For: 0.8
>
>
> We should be able to do SSTable compression which would trade CPU for I/O (almost always
a good trade).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message