incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laing, Michael" <michael.la...@nytimes.com>
Subject Re: clearing tombstones?
Date Fri, 11 Apr 2014 18:11:36 GMT
At the cost of really quite a lot of compaction, you can temporarily switch
to SizeTiered, and when that is completely done (check each node), switch
back to Leveled.

it's like doing the laundry twice :)

I've done this on CFs that were about 5GB but I don't see why it wouldn't
work on larger ones.

ml


On Fri, Apr 11, 2014 at 1:33 PM, Paulo Ricardo Motta Gomes <
paulo.motta@chaordicsystems.com> wrote:

> This thread is really informative, thanks for the good feedback.
>
> My question is : Is there a way to force tombstones to be clared with LCS?
> Does scrub help in any case? Or the only solution would be to create a new
> CF and migrate all the data if you intend to do a large CF cleanup?
>
> Cheers,
>
>
> On Fri, Apr 11, 2014 at 2:02 PM, Mark Reddy <mark.reddy@boxever.com>wrote:
>
>> Thats great Will, if you could update the thread with the actions you
>> decide to take and the results that would be great.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 5:53 PM, William Oberman <
>> oberman@civicscience.com> wrote:
>>
>>> I've learned a *lot* from this thread.  My thanks to all of the
>>> contributors!
>>>
>>> Paulo: Good luck with LCS.  I wish I could help there, but all of my
>>> CF's are SizeTiered (mostly as I'm on the same schema/same settings since
>>> 0.7...)
>>>
>>> will
>>>
>>>
>>>
>>> On Fri, Apr 11, 2014 at 12:14 PM, Mina Naguib <mina.naguib@adgear.com>wrote:
>>>
>>>>
>>>> Levelled Compaction is a wholly different beast when it comes to
>>>> tombstones.
>>>>
>>>> The tombstones are inserted, like any other write really, at the lower
>>>> levels in the leveldb hierarchy.
>>>>
>>>> They are only removed after they have had the chance to "naturally"
>>>> migrate upwards in the leveldb hierarchy to the highest level in your data
>>>> store.  How long that takes depends on:
>>>>  1. The amount of data in your store and the number of levels your LCS
>>>> strategy has
>>>> 2. The amount of new writes entering the bottom funnel of your leveldb,
>>>> forcing upwards compaction and combining
>>>>
>>>> To give you an idea, I had a similar scenario and ran a (slow,
>>>> throttled) delete job on my cluster around December-January.  Here's a
>>>> graph of the disk space usage on one node.  Notice the still-diclining
>>>> usage long after the cleanup job has finished (sometime in January).  I
>>>> tend to think of tombstones in LCS as little bombs that get to explode much
>>>> later in time:
>>>>
>>>> http://mina.naguib.ca/images/tombstones-cassandra-LCS.jpg
>>>>
>>>>
>>>>
>>>> On 2014-04-11, at 11:20 AM, Paulo Ricardo Motta Gomes <
>>>> paulo.motta@chaordicsystems.com> wrote:
>>>>
>>>> I have a similar problem here, I deleted about 30% of a very large CF
>>>> using LCS (about 80GB per node), but still my data hasn't shrinked, even
if
>>>> I used 1 day for gc_grace_seconds. Would nodetool scrub help? Does nodetool
>>>> scrub forces a minor compaction?
>>>>
>>>> Cheers,
>>>>
>>>> Paulo
>>>>
>>>>
>>>> On Fri, Apr 11, 2014 at 12:12 PM, Mark Reddy <mark.reddy@boxever.com>wrote:
>>>>
>>>>> Yes, running nodetool compact (major compaction) creates one large
>>>>> SSTable. This will mess up the heuristics of the SizeTiered strategy
(is
>>>>> this the compaction strategy you are using?) leading to multiple 'small'
>>>>> SSTables alongside the single large SSTable, which results in increased
>>>>> read latency. You will incur the operational overhead of having to manage
>>>>> compactions if you wish to compact these smaller SSTables. For all these
>>>>> reasons it is generally advised to stay away from running compactions
>>>>> manually.
>>>>>
>>>>> Assuming that this is a production environment and you want to keep
>>>>> everything running as smoothly as possible I would reduce the gc_grace
on
>>>>> the CF, allow automatic minor compactions to kick in and then increase
the
>>>>> gc_grace once again after the tombstones have been removed.
>>>>>
>>>>>
>>>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>>>> oberman@civicscience.com> wrote:
>>>>>
>>>>>> So, if I was impatient and just "wanted to make this happen now",
I
>>>>>> could:
>>>>>>
>>>>>> 1.) Change GCGraceSeconds of the CF to 0
>>>>>> 2.) run nodetool compact (*)
>>>>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>>>>
>>>>>> Since I have ~900M tombstones, even if I miss a few due to
>>>>>> impatience, I don't care *that* much as I could re-run my clean up
tool
>>>>>> against the now much smaller CF.
>>>>>>
>>>>>> (*) A long long time ago I seem to recall reading advice about "don't
>>>>>> ever run nodetool compact", but I can't remember why.  Is there any
bad
>>>>>> long term consequence?  Short term there are several:
>>>>>> -a heavy operation
>>>>>> -temporary 2x disk space
>>>>>> -one big SSTable afterwards
>>>>>> But moving forward, everything is ok right?
>>>>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>>>>> etc...  The only flaw I can think of is it will take forever until
the
>>>>>> SSTable minor compactions build up enough to consider including the
big
>>>>>> SSTable in a compaction, making it likely I'll have to self manage
>>>>>> compactions.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <mark.reddy@boxever.com>wrote:
>>>>>>
>>>>>>> Correct, a tombstone will only be removed after gc_grace period
has
>>>>>>> elapsed. The default value is set to 10 days which allows a great
deal of
>>>>>>> time for consistency to be achieved prior to deletion. If you
are
>>>>>>> operationally confident that you can achieve consistency via
anti-entropy
>>>>>>> repairs within a shorter period you can always reduce that 10
day interval.
>>>>>>>
>>>>>>>
>>>>>>> Mark
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>>>>> oberman@civicscience.com> wrote:
>>>>>>>
>>>>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>>>>> tombstones and GCGraceSeconds, which might be my problem
(I just checked,
>>>>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>>>>> tbarbugli@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> compaction should take care of it; for me it never worked
so I run
>>>>>>>>> nodetool compaction on every node; that does it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <
>>>>>>>>> oberman@civicscience.com>:
>>>>>>>>>
>>>>>>>>> I'm wondering what will clear tombstoned rows?  nodetool
cleanup,
>>>>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>>>>
>>>>>>>>>> I had a CF that was more or less storing session
information.
>>>>>>>>>>  After some time, we decided that one piece of this
information was
>>>>>>>>>> pointless to track (and was 90%+ of the columns,
and in 99% of those cases
>>>>>>>>>> was ALL columns for a row).   I wrote a process to
remove all of those
>>>>>>>>>> columns (which again in a vast majority of cases
had the effect of removing
>>>>>>>>>> the whole row).
>>>>>>>>>>
>>>>>>>>>> This CF had ~1 billion rows, so I expect to be left
with ~100m
>>>>>>>>>> rows.  After I did this mass delete, everything was
the same size on disk
>>>>>>>>>> (which I expected, knowing how tombstoning works).
 It wasn't 100% clear to
>>>>>>>>>> me what to poke to cause compactions to clear the
tombstones.  First I
>>>>>>>>>> tried nodetool cleanup on a candidate node.  But,
afterwards the disk usage
>>>>>>>>>> was the same.  Then I tried nodetool repair on that
same node.  But again,
>>>>>>>>>> disk usage is still the same.  The CF has no snapshots.
>>>>>>>>>>
>>>>>>>>>> So, am I misunderstanding something?  Is there another
operation
>>>>>>>>>> to try?  Do I have to "just wait"?  I've only done
cleanup/repair on one
>>>>>>>>>> node.  Do I have to run one or the other over all
nodes to clear
>>>>>>>>>> tombstones?
>>>>>>>>>>
>>>>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>> will
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Paulo Motta*
>>>>
>>>> Chaordic | *Platform*
>>>> *www.chaordic.com.br <http://www.chaordic.com.br/>*
>>>> +55 48 3232.3200
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br <http://www.chaordic.com.br/>*
> +55 48 3232.3200
>

Mime
View raw message