cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Yaskevich (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-47) SSTable compression
Date Wed, 27 Jul 2011 22:05:11 GMT


Pavel Yaskevich updated CASSANDRA-47:

    Attachment: CASSANDRA-47-v4-fixes.patch

bq. We need to do something for this ticket. Right now, if someone call the public flush or
sync methods of CSW mistakenly, it will corrupt data. So we should at least have the public
flush and sync throw an UnsupportedOperationException. I'm not saying it will be particularly
clean, but I'll take "slightly ugly and safe" over "cleaner but dangerous" anytime. I'm fine
with leaving a "cleaner" refactoring of this to a separate task though.

CSW sync() and flush() both throw UnsupportedOperationException now.

bq. resetAndTruncate is still a problem. ...

Fixed by using information from metadata file (to avoid keeping information about chunks in

bq. In CSW, when resetBuffer is called, current is supposed to either be "aligned" on chunk
boundary or we're closing the file. So it seems there is no need to realign bufferOffset,
and thus no need to override resetBuffer.


bq. The truncateAndClose of CSW doesn't seem to truncate anything. It also doesn't honor skipIOCache
correctly since it doesn't call the truncateAndClose of SW. But actually, I think that if
the only backward seek we do is through resetAndTruncate, then there is no need to truncate
on close (neither for SW nor CSW). So we should probably get rid of that function and move
the relevant parts in close().

truncateAndClose method is removed, relevant parts moved to close().

bq. Let's use readUTF/writeUTF to read/write the algorithm name in the metadata file. That's
what we use for strings usually (and using a StringBuilder to read a string is a tad over
the top).


bq. CompressionMetadata.readChunkOffsets() is buggy if the dataLength is a multiple of the
chunckLength (we have one less chunk that what's computed then).

Fixed but storing information about chunk count in the index file so we no longer need to
count anything.

bq. No need to reset validBufferBytes in CSW.flushData(). It's done in resetBuffer (and not
in SW.flushData(), so that'll improve symmetry).


bq. Chunk could be a static class in CompressionMetada I suppose.


> SSTable compression
> -------------------
>                 Key: CASSANDRA-47
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>              Labels: compression
>             Fix For: 1.0
>         Attachments: CASSANDRA-47-v2.patch, CASSANDRA-47-v3-rebased.patch, CASSANDRA-47-v3.patch,
CASSANDRA-47-v4-fixes.patch, CASSANDRA-47-v4.patch, CASSANDRA-47.patch, snappy-java-1.0.3-rc4.jar
> We should be able to do SSTable compression which would trade CPU for I/O (almost always
a good trade).

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message