cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data
Date Thu, 04 Aug 2011 10:31:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079314#comment-13079314
] 

Sylvain Lebresne commented on CASSANDRA-1717:
---------------------------------------------

bq. checksum at the column level only which will give us better control over individual columns
and does not seem to be a big overhead

I agree that it is by far the simplest approach for non compressed data, but I, for one, am
a bit concerned by the overhead: 4 bytes per column is not negligible. On some load, that
could easily mean a 10-20% data size increase. Basically I am concerned about people upgrading
to 1.0 and want to make sure that upgrading brings no surprise for them (and this even if
they don't "trust" compression yet, which would be perfectly reasonable). For that to be true,
I think that if we go with checksum at the column level we would need to make that optional
and off by default.

bq. Checksum on the compressed block level is unnecessary because bitrot, for example, will
be detected right on decompression

Not sure that's bulletproof. I don't think all compression algorithm ships with a checksum
(I don't know about snappy typically). When they don't, it's totally possible for bitrot to
corrupt compressed data without being a problem at decompression nor at deserialization if
you're unlucky (granted it is more unlikely to go undetected that without compression but
it is not good enough). So either we check that snappy use checksumming and we only add support
for algorithm that does, or it is still useful. 

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>
>                 Key: CASSANDRA-1717
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1717
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>
>         Attachments: checksums.txt
>
>
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable,
so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column
data readable we do not detect it, and if it corrupts to a higher timestamp value can even
resist being overwritten by newer values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message