cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Yaskevich (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-1717) Cassandra cannot detect corrupt-but-readable column data
Date Fri, 05 Aug 2011 17:15:27 GMT


Pavel Yaskevich commented on CASSANDRA-1717:

bq. can't you just implement a no-op compression option that will utilize what you're doing
/ planning to do for compression in terms of block structure and block level checksums? Good
question. Pavel?

That sounds like a special-casing and it has complications mentioned before - more I/O, need
to hold up buffer size, won't play nice with mmap. Placing it to the block level will harden
creation of the tools to process corruption (as Jake mentioned) because we think in the "data
model" way not in the file block way.

First all we should define a goal we pursue by this - which is essential.

If this is only about repair and replication I think that the good way will be to checksum
at row boundary level which will be: relatively simple to check and play nice with mmap.

I still think that the best way to check for corruption will be to use checksum at row header
(key and row index) and column level even if that introduces disk space and CPU overhead (the
necessary sacrifice), this could be most elegant solution because of few things where two
of them could be: introduces no system wide complexity (aka special-casing) related to how
we work with SSTables and repair and allow as think in our data model terms.

But it somehow fills like we are missing better solution in here...

> Cassandra cannot detect corrupt-but-readable column data
> --------------------------------------------------------
>                 Key: CASSANDRA-1717
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 1.0
>         Attachments: checksums.txt
> Most corruptions of on-disk data due to bitrot render the column (or row) unreadable,
so the data can be replaced by read repair or anti-entropy.  But if the corruption keeps column
data readable we do not detect it, and if it corrupts to a higher timestamp value can even
resist being overwritten by newer values.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message