cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5745) Minor compaction tombstone-removal deadlock
Date Thu, 11 Jul 2013 16:55:48 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705974#comment-13705974
] 

Sylvain Lebresne commented on CASSANDRA-5745:
---------------------------------------------

bq.  if you want to compact "everything that overlaps with one sstable" from L1

In practice, you don't necessarily want to go all the way up. If the 2 sstables that potentially
block each other from getting tombstone removed are in L1 and L2, that's all that you compact.
And for 2 sstables A and B to meet that criteria, they must 1) overlapp and 2) A must contain
older data than B and vice versa. In particular, and discarding the case where people do crazy
shit with their column timestamp, this means that A and B can only meet that criteria if they
are sstables that follow each other in order of flush (of the result of the compaction of
those). And because sstable can't magically jump level randomly, this also mean those sstable
will be pretty much always in subsequent level in practice.

This is btw why I don't think the original problem described above is not really a problem
in practice. Because if 2 sstables are in the "deadlock" criteria, they will be close in levels
and will in fact get compacted relatively quickly in practice, so I'm not sure you can get
them to deadlock for long enough that it's a problem in practice.

Besides, I'm all for making sure we don't trigger that new heuristic too often: we could for
instance only do it for sstable that has not been compacted recently, saying with a day (we
do something similar already for tombstone compaction), so that we don't end up triggering
that heuristic too eagerly.  Besides, it would only be triggered if you have nothing better
to do in the first place.

bq. it only does massive amounts of compaction when you explicitly ask for it

But imo, this is dodging the problem. How do you know that you need to trigger the "big hammer"?
Either we say "if you suspect that you have a problem", then normal user will almost never
know when they should do it. If we say "trigger it regularly like for active repair just in
case", then I'm -1 on the idea because I'm pretty sure that 99.9% of the case people will
just inflict massive I/O on themselves for no reason.

Again, I'm not totally opposed to adding major compaction for LCS (I'm just not excessively
enthousiastic about it), we have it for size tiered after all where it sucks even more. But
as far as solving the problem mentionned in the description is concerned, I'm not convinced
at all that it's the right solution. In fact, for that, my preference would go in improving
the metrics we expose on our sstables (could be things like how often tombstone that are gcable
do survive a compaction over time) and wait to make sure we actually have a practical problem,
not just a theoretical one.
                
> Minor compaction tombstone-removal deadlock
> -------------------------------------------
>
>                 Key: CASSANDRA-5745
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5745
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jonathan Ellis
>             Fix For: 2.0.1
>
>
> From a discussion with Axel Liljencrantz,
> If you have two SSTables that have temporally overlapping data, you can get lodged into
a state where a compaction of SSTable A can't drop tombstones because SSTable B contains older
data *and vice versa*. Once that's happened, Cassandra should be wedged into a state where
CASSANDRA-4671 no longer helps with tombstone removal. The only way to break the wedge would
be to perform a compaction containing both SSTable A and SSTable B. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message