cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Yaskevich (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (CASSANDRA-47) SSTable compression
Date Tue, 19 Jul 2011 15:20:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067781#comment-13067781
] 

Pavel Yaskevich edited comment on CASSANDRA-47 at 7/19/11 3:20 PM:
-------------------------------------------------------------------

bq.  A small detail though is that I would store the chunk offsets instead of the chunk sizes,
the reason being that it's more resilient to corruption (typically, with chunk sizes, if the
first entry is corrupted you're screwed, with offsets, you only have one or two chunks that
are unreadable).

+1 if we will go with a separate file. I'm thinking if we will go with a separate file I will
use the same strategy as I did in v1 - store chunk size at the beginning of the chunk and
re-read it instead of keeping it in memory (lowers memory usage for larger files).

bq. After all, CompressedDataFile is just a BRAF with a fixed buffer size, and a mechanism
to translate pre-compaction file position to compressed file position (roughly). So I'm pretty
sure it should be possible to have CompressedDataFile extend BRAF with minimum refactoring
(of BRAF that is). It would also lift for free the limitation of not have read-write compressed
file (not that we use them but ...).

To extend BRAF we will need to split it into Input/Output classes which will imply refactoring
of skip cache functionality and other parts of that class. I'd rather create a separate issue
to do that after compression is committed instead of putting all eggs in one basket.

+1 on everything else.

      was (Author: xedin):
    bq.  A small detail though is that I would store the chunk offsets instead of the chunk
sizes, the reason being that it's more resilient to corruption (typically, with chunk sizes,
if the first entry is corrupted you're screwed, with offsets, you only have one or two chunks
that are unreadable).

+1 if we will go with a separate file. I'm thinking if we will go with a separate file I will
use the same strategy as I did in v1 - store chunk size at the beginning of the chunk and
re-read it instead of keeping it in memory (lowers memory usage for larger files).

bq. After all, CompressedDataFile is just a BRAF with a fixed buffer size, and a mechanism
to translate pre-compaction file position to compressed file position (roughly). So I'm pretty
sure it should be possible to have CompressedDataFile extend BRAF with minimum refactoring
(of BRAF that is). It would also lift for free the limitation of not have read-write compressed
file (not that we use them but ...).

To extend BRAF will will need to split it into Input/Output classes which will imply refactoring
of skip cache functionality and other parts of that class. I'd rather create a separate issue
to do that after compression is committed instead of putting all eggs in one basket.

+1 on everything else.
  
> SSTable compression
> -------------------
>
>                 Key: CASSANDRA-47
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-47
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>
>         Attachments: CASSANDRA-47-v2.patch, CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
>
>
> We should be able to do SSTable compression which would trade CPU for I/O (almost always
a good trade).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message