cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-12682) Silent data corruption and corruption propagation in Cassandra
Date Fri, 22 Sep 2017 04:51:03 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-12682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeff Jirsa updated CASSANDRA-12682:
-----------------------------------
    Component/s: Core

> Silent data corruption and corruption propagation in Cassandra
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-12682
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12682
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Aishwarya Ganesan
>            Priority: Critical
>              Labels: Correctness
>             Fix For: 4.x
>
>
> Corruptions in Cassandra's SSTable data can be silently returned to users if SSTable
compression is disabled. 
> Cassandra maintains a digest.crc32 and CRC.db in the sstable directory but fails to detect
the corruptions to SSTable Data.db. Without this, Cassandra is vulnerable to silent corruptions
resulting from underlying problems in disks and file systems atop them. Studies support the
need for end to end integrity:
> https://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf
> http://www.cs.toronto.edu/~bianca/papers/fast08.pdf
> In a small test case where the underlying disk/FS corrupts a particular block holding
the user data, Cassandra can silently return corrupted user data on a read request. Also,
the read repair or anti-entropy can propagate the corrupted data to other intact replicas
when the corrupted value is lexically greater. This is because a corruption doesn't change
the timestamps and timestamp conflicts are resolved by choosing the data with the highest
value. (We reproduced this scenario using our testing framework)
> Why does Cassandra not use the CRC and digests to verify the integrity of data in the
SStables on read? Are the digest.crc32 and CRC.db files ever used?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message