cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data
Date Thu, 04 Aug 2011 22:01:28 GMT


Todd Lipcon commented on CASSANDRA-1717:

xedin on IRC asked me to comment on this issue. For reference of what other systems do: HDFS
checksums every file in 512-byte chunks with a CRC32. It's verified on write (by only the
first DN in the pipeline) and on read (by the client). If the client gets a checksum error
while reading, it will report this to the NN, and the NN will mark that block as corrupt,
schedule another replication, etc.

This is all transparent to the HBase layer since it's done at the FS layer. So, HBase itself
doesn't do any extra checksumming. If you compress your tables, then you might get an extra
layer of checksumming for free from gzip as someone mentioned above.

For some interesting JIRAs on checksum performance, check out HADOOP-6148 and various followups,
as well as current work in progress HDFS-2080

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>                 Key: CASSANDRA-1717
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>         Attachments: checksums.txt
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable,
so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column
data readable we do not detect it, and if it corrupts to a higher timestamp value can even
resist being overwritten by newer values.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message