incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: repair, compaction, and tombstone rows
Date Wed, 31 Oct 2012 23:56:03 GMT
> Is this a feature or a bug?  
Yes :)

You are probably on a bit of an edge case. 

Maybe a purge-able tombstone can be ignored as part of the merkle tree calculation and skipped
from the streaming? (have not checked the code to see if they already are.)

Can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA  and describe the
problem ? 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/11/2012, at 8:04 AM, Bryan Talbot <btalbot@aeriagames.com> wrote:

> I've been experiencing a behavior that is undesirable and it seems like a bug that causes
a high amount of wasted work.
> 
> I have a CF where all columns have a TTL, are generally all inserted in a very short
period of time (less than a second) and are never over-written or explicitly deleted.  Eventually
one node will run a compaction and remove rows containing only tombstones greater than gc_grace_seconds
old which is expected.  
> 
> The problem comes up when a repair is run.  During the repair the other nodes that haven't
run a compaction and still have the tombstoned rows "fix" the inconsistency and stream the
rows (which contain only a tombstone which is more than gc_grace_seconds old) back to the
node which had compacted that row away.  This ends up occurring over and over and uses a lot
of time, storage, and bandwidth to keep repairing rows that are intentionally missing.
> 
> I think the issue stems from the behavior of compaction of TTL rows and repair.  The
compaction of TTL rows is a node-local event which will eventually cause tombstoned rows to
disappear from the one node doing the compaction and then get "repaired" from replicas later.
 I guess this could happen for rows which are explicitly deleted as well.
> 
> Is this a feature or a bug?  How can I avoid repair of rows that were correctly removed
via compaction from one node but not from replicas just because compactions run independently
on each node?  Every repair ends up streaming tens of gigabytes of "missing" rows to and from
replicas.
> 
> Cassandra 1.1.5 with size tiered compaction strategy and RF=3
> 
> -Bryan
> 
> 


Mime
View raw message