incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Ricardo Motta Gomes <paulo.mo...@chaordicsystems.com>
Subject Re: Automatic tombstone removal issue (STCS)
Date Wed, 07 May 2014 01:00:42 GMT
Robert: thanks for the support, you are right, this belonged more to the
dev list but I didn't think of it.

Yuki: thanks a lot for the clarification, this is what I suspected.

I understand it's costly to check row by row overlap in order to decide if
a SSTable is candidate for compaction, but doesn't the compaction process
already performs this check when removing tombstones? So, couldn't this
check be dropped during decision time and let the compaction run anyway?

This optimization is specially interesting with large STCS sstables, where
the token range will very likely overlap with all other sstables, so it's a
pity it's almost never being triggered in these cases.

On Tue, May 6, 2014 at 9:32 PM, Yuki Morishita <mor.yuki@gmail.com> wrote:

> Hi Paulo,
>
> The reason we check overlap is not to resurrect deleted data by only
> dropping tombstone marker from single SSTable.
> And we don't want to check row by row to determine if SSTable is
> droppable since it takes time, so we use token ranges to determine if
> it MAY have droppable columns.
>
> On Tue, May 6, 2014 at 7:14 PM, Paulo Ricardo Motta Gomes
> <paulo.motta@chaordicsystems.com> wrote:
> > Hello,
> >
> > Sorry for being persistent, but I'd love to clear my understanding on
> this.
> > Has anyone seen single sstable compaction being triggered for STCS
> sstables
> > with high tombstone ratio?
> >
> > Because if the above understanding is correct, the current implementation
> > almost never triggers this kind of compaction, since the token ranges of
> a
> > node's sstable almost always overlap. Could this be a bug or is it
> expected
> > behavior?
> >
> > Thank you,
> >
> >
> >
> > On Mon, May 5, 2014 at 8:59 AM, Paulo Ricardo Motta Gomes
> > <paulo.motta@chaordicsystems.com> wrote:
> >>
> >> Hello,
> >>
> >> After noticing that automatic tombstone removal (CASSANDRA-3442) was not
> >> working in an append-only STCS CF with 40% of droppable tombstone ratio
> I
> >> investigated why the compaction was not being triggered in the largest
> >> SSTable with 16GB and about 70% droppable tombstone ratio.
> >>
> >> When the code goes to check if the SSTable is candidate to be compacted
> >> (AbstractCompactionStrategy.worthDroppingTombstones), it verifies if
> all the
> >> others SSTables overlap with the current SSTable by checking if the
> start
> >> and end tokens overlap. The problem is that all SSTables contain pretty
> much
> >> the whole node token range, so all of them overlap nearly all the time,
> so
> >> the automatic tombstone removal never happens. Is there any case in STCS
> >> where all sstables token ranges DO NOT overlap?
> >>
> >> I understand during the tombstone removal process it's necessary to
> verify
> >> if the compacted row exists in any other SSTable, but I don't
> understand why
> >> it's necessary to verify if the token ranges overlap to decide if a
> >> tombstone compaction must be executed on a single SSTable with high
> >> droppable tombstone ratio.
> >>
> >> Any clarification would be kindly appreciated.
> >>
> >> PS: Cassandra version: 1.2.16
> >>
> >> --
> >> Paulo Motta
> >>
> >> Chaordic | Platform
> >> www.chaordic.com.br
> >> +55 48 3232.3200
> >
> >
> >
> >
> > --
> > Paulo Motta
> >
> > Chaordic | Platform
> > www.chaordic.com.br
> > +55 48 3232.3200
>
>
>
> --
> Yuki Morishita
>  t:yukim (http://twitter.com/yukim)
>



-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Mime
View raw message