cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data
Date Fri, 05 Aug 2011 18:17:27 GMT


Jonathan Ellis commented on CASSANDRA-1717:

bq. That sounds like a special-casing

I don't follow.  It feels exactly the opposite of a special case to me: using the per-block
code that we already have.

bq. more I/O, need to hold up buffer size, won't play nice with mmap

That's why we give people the choice.  But I'm pretty sure that after 1.0 we'll make compression
the default.  So I don't want to add a lot of complexity for uncompressed sstables.

bq. we should define a goal we pursue by this

Here's our requirement:

- prevent corruption from being replicated
- detect and remove corruption on repair 

Nice to have:
- low complexity of implementation
- low space overhead
- detect corruption as soon as it is read

bq. I still think that the best way to check for corruption will be to use checksum at row
header (key and row index) and column level

That's not crazy, and it achieves all goals except low space overhead.  But for the reasons
above I still think block-level is a better fit.

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>                 Key: CASSANDRA-1717
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>         Attachments: checksums.txt
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable,
so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column
data readable we do not detect it, and if it corrupts to a higher timestamp value can even
resist being overwritten by newer values.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message