cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paulo Ricardo Motta Gomes <paulo.mo...@chaordicsystems.com>
Subject Re: clearing tombstones?
Date Fri, 11 Apr 2014 17:33:11 GMT
This thread is really informative, thanks for the good feedback.

My question is : Is there a way to force tombstones to be clared with LCS?
Does scrub help in any case? Or the only solution would be to create a new
CF and migrate all the data if you intend to do a large CF cleanup?

Cheers,


On Fri, Apr 11, 2014 at 2:02 PM, Mark Reddy <mark.reddy@boxever.com> wrote:

> Thats great Will, if you could update the thread with the actions you
> decide to take and the results that would be great.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman <oberman@civicscience.com
> > wrote:
>
>> I've learned a *lot* from this thread.  My thanks to all of the
>> contributors!
>>
>> Paulo: Good luck with LCS.  I wish I could help there, but all of my CF's
>> are SizeTiered (mostly as I'm on the same schema/same settings since 0.7...)
>>
>> will
>>
>>
>>
>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mina.naguib@adgear.com>wrote:
>>
>>>
>>> Levelled Compaction is a wholly different beast when it comes to
>>> tombstones.
>>>
>>> The tombstones are inserted, like any other write really, at the lower
>>> levels in the leveldb hierarchy.
>>>
>>> They are only removed after they have had the chance to "naturally"
>>> migrate upwards in the leveldb hierarchy to the highest level in your data
>>> store.  How long that takes depends on:
>>>  1. The amount of data in your store and the number of levels your LCS
>>> strategy has
>>> 2. The amount of new writes entering the bottom funnel of your leveldb,
>>> forcing upwards compaction and combining
>>>
>>> To give you an idea, I had a similar scenario and ran a (slow,
>>> throttled) delete job on my cluster around December-January.  Here's a
>>> graph of the disk space usage on one node.  Notice the still-diclining
>>> usage long after the cleanup job has finished (sometime in January).  I
>>> tend to think of tombstones in LCS as little bombs that get to explode much
>>> later in time:
>>>
>>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>>>
>>>
>>>
>>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
>>> paulo.motta@chaordicsystems.com> wrote:
>>>
>>> I have a similar problem here, I deleted about 30% of a very large CF
>>> using LCS (about 80GB per node), but still my data hasn't shrinked, even if
>>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
>>> scrub forces a minor compaction?
>>>
>>> Cheers,
>>>
>>> Paulo
>>>
>>>
>>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <mark.reddy@boxever.com>wrote:
>>>
>>>> Yes, running nodetool compact (major compaction) creates one large
>>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>>>> this the compaction strategy you are using?) leading to multiple 'small'
>>>> SSTables alongside the single large SSTable, which results in increased
>>>> read latency. You will incur the operational overhead of having to manage
>>>> compactions if you wish to compact these smaller SSTables. For all these
>>>> reasons it is generally advised to stay away from running compactions
>>>> manually.
>>>>
>>>> Assuming that this is a production environment and you want to keep
>>>> everything running as smoothly as possible I would reduce the gc_grace on
>>>> the CF, allow automatic minor compactions to kick in and then increase the
>>>> gc_grace once again after the tombstones have been removed.
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>>> oberman@civicscience.com> wrote:
>>>>
>>>>> So, if I was impatient and just "wanted to make this happen now", I
>>>>> could:
>>>>>
>>>>> 1.) Change GCGraceSeconds of the CF to 0
>>>>> 2.) run nodetool compact (*)
>>>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>>>
>>>>> Since I have ~900M tombstones, even if I miss a few due to impatience,
>>>>> I don't care *that* much as I could re-run my clean up tool against the
now
>>>>> much smaller CF.
>>>>>
>>>>> (*) A long long time ago I seem to recall reading advice about "don't
>>>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>>>> long term consequence?  Short term there are several:
>>>>> -a heavy operation
>>>>> -temporary 2x disk space
>>>>> -one big SSTable afterwards
>>>>> But moving forward, everything is ok right?
>>>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>>>> etc...  The only flaw I can think of is it will take forever until the
>>>>> SSTable minor compactions build up enough to consider including the big
>>>>> SSTable in a compaction, making it likely I'll have to self manage
>>>>> compactions.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <mark.reddy@boxever.com>wrote:
>>>>>
>>>>>> Correct, a tombstone will only be removed after gc_grace period has
>>>>>> elapsed. The default value is set to 10 days which allows a great
deal of
>>>>>> time for consistency to be achieved prior to deletion. If you are
>>>>>> operationally confident that you can achieve consistency via anti-entropy
>>>>>> repairs within a shorter period you can always reduce that 10 day
interval.
>>>>>>
>>>>>>
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>>>> oberman@civicscience.com> wrote:
>>>>>>
>>>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>>>> tombstones and GCGraceSeconds, which might be my problem (I just
checked,
>>>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>>
>>>>>>>> compaction should take care of it; for me it never worked
so I run
>>>>>>>> nodetool compaction on every node; that does it.
>>>>>>>>
>>>>>>>>
>>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <
>>>>>>>> oberman@civicscience.com>:
>>>>>>>>
>>>>>>>> I'm wondering what will clear tombstoned rows?  nodetool
cleanup,
>>>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>>>
>>>>>>>>> I had a CF that was more or less storing session information.
>>>>>>>>>  After some time, we decided that one piece of this information
was
>>>>>>>>> pointless to track (and was 90%+ of the columns, and
in 99% of those cases
>>>>>>>>> was ALL columns for a row).   I wrote a process to remove
all of those
>>>>>>>>> columns (which again in a vast majority of cases had
the effect of removing
>>>>>>>>> the whole row).
>>>>>>>>>
>>>>>>>>> This CF had ~1 billion rows, so I expect to be left with
~100m
>>>>>>>>> rows.  After I did this mass delete, everything was the
same size on disk
>>>>>>>>> (which I expected, knowing how tombstoning works).  It
wasn't 100% clear to
>>>>>>>>> me what to poke to cause compactions to clear the tombstones.
 First I
>>>>>>>>> tried nodetool cleanup on a candidate node.  But, afterwards
the disk usage
>>>>>>>>> was the same.  Then I tried nodetool repair on that same
node.  But again,
>>>>>>>>> disk usage is still the same.  The CF has no snapshots.
>>>>>>>>>
>>>>>>>>> So, am I misunderstanding something?  Is there another
operation
>>>>>>>>> to try?  Do I have to "just wait"?  I've only done cleanup/repair
on one
>>>>>>>>> node.  Do I have to run one or the other over all nodes
to clear
>>>>>>>>> tombstones?
>>>>>>>>>
>>>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>> will
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> *Paulo Motta*
>>>
>>> Chaordic | *Platform*
>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>> +55 48 3232.3200
>>>
>>>
>>>
>>
>>
>>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br <http://www.chaordic.com.br/>*
+55 48 3232.3200

Mime
View raw message