cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Podkowinski (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-12991) Inter-node race condition in validation compaction
Date Mon, 05 Dec 2016 11:38:58 GMT


Stefan Podkowinski commented on CASSANDRA-12991:

Let's assume it would be possible to do validation of sstables based on a provided timestamp,
there's still the issue that data could be reconciled in memtables in a way that would still
result in a mismatch.

t = 10000
NodeA - Mutation(k=1, ts=1)
NodeA - Flush
NodeB - Mutation(k=1, ts=1)

t = 10001
NodeA - Mutation(k=1, ts=2)
NodeB - Mutation(k=1, ts=2)
NodeB - Flush

If you start validation based on t(10000) you'd still end up with a mismatch, as only Mutation(k=1,
ts=2) would have been flushed to disk on NodeB, while ts=1 (which would be subject to the
validation timestamp) would not.

> Inter-node race condition in validation compaction
> --------------------------------------------------
>                 Key: CASSANDRA-12991
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benjamin Roth
>            Priority: Minor
> Problem:
> When a validation compaction is triggered by a repair it may happen that due to flying
in mutations the merkle trees differ but the data is consistent however.
> Example:
> t = 10000: 
> Repair starts, triggers validations
> Node A starts validation
> t = 10001:
> Mutation arrives at Node A
> t = 10002:
> Mutation arrives at Node B
> t = 10003:
> Node B starts validation
> Hashes of node A+B will differ but data is consistent from a view (think of it like a
snapshot) t = 10000.
> Impact:
> Unnecessary streaming happens. This may not a big impact on low traffic CFs, partitions
but on high traffic CFs and maybe very big partitions, this may have a bigger impact and is
a waste of resources.
> Possible solution:
> Build hashes based upon a snapshot timestamp.
> This requires SSTables created after that timestamp to be filtered when doing a validation
> - Cells with timestamp > snapshot time have to be removed
> - Tombstone range markers have to be handled
>  - Bounds have to be removed if delete timestamp > snapshot time
>  - Boundary markers have to be either changed to a bound or completely removed, depending
if start and/or end are both affected or not
> Probably this is a known behaviour. Have there been any discussions about this in the
past? Did not find an matching issue, so I created this one.
> I am happy about any feedback, whatsoever.

This message was sent by Atlassian JIRA

View raw message