cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "C. Scott Andreas (Jira)" <>
Subject [jira] [Updated] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth
Date Wed, 07 Oct 2020 03:51:00 GMT


C. Scott Andreas updated CASSANDRA-15369:
    Fix Version/s:     (was: 4.0-triage)

> Fake row deletions and range tombstones, causing digest mismatch and sstable growth
> -----------------------------------------------------------------------------------
>                 Key: CASSANDRA-15369
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>            Reporter: Benedict Elliott Smith
>            Assignee: Zhao Yang
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0-beta
> As assessed in CASSANDRA-15363, we generate fake row deletions and fake tombstone markers
under various circumstances:
>  * If we perform a clustering key query (or select a compact column):
>  * Serving from a {{Memtable}}, we will generate fake row deletions
>  * Serving from an sstable, we will generate fake row tombstone markers
>  * If we perform a slice query, we will generate only fake row tombstone markers for
any range tombstone that begins or ends outside of the limit of the requested slice
>  * If we perform a multi-slice or IN query, this will occur for each slice/clustering
> Unfortunately, these different behaviours can lead to very different data stored in sstables
until a full repair is run.  When we read-repair, we only send these fake deletions or range
tombstones.  A fake row deletion, clustering RT and slice RT, each produces a different digest. 
So for each single point lookup we can produce a digest mismatch twice, and until a full repair
is run we can encounter an unlimited number of digest mismatches across different overlapping
> Relatedly, this seems a more problematic variant of our atomicity failures caused by
our monotonic reads, since RTs can have an atomic effect across (up to) the entire partition,
whereas the propagation may happen on an arbitrarily small portion.  If the RT exists on
only one node, this could plausibly lead to fairly problematic scenario if that node fails
before the range can be repaired. 
> At the very least, this behaviour can lead to an almost unlimited amount of extraneous
data being stored until the range is repaired and compaction happens to overwrite the sub-range
RTs and row deletions.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message