cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Reddy <>
Subject Re: clearing tombstones?
Date Fri, 11 Apr 2014 16:12:04 GMT
To clarify, you would want to manage compactions only if you were concerned
about read latency. If you update rows, those rows may become spread across
an increasing number of SSTables leading to increased read latency.

Thanks for providing some insight into your use case as it does differ from
the norm. If you would consider 50GB a small CF and your data ingestion
sufficient enough to result in more SSTables of similar size soon, yes you
could run a major compaction will little operational overhead and the
compaction strategies heuristics would level out after some time.

On Fri, Apr 11, 2014 at 4:52 PM, Laing, Michael

> I have played with this quite a bit and recommend you set gc_grace_seconds
> to 0 and use 'nodetool compact [keyspace] [cfname]' on your table.
> A caveat I have is that we use C* 2.0.6 - but the space we expect to
> recover is in fact recovered.
> Actually, since we never delete explicitly (just ttl) we always have
> gc_grace_seconds set to 0.
> Another important caveat is to be careful with repair: having set gc to 0
> and compacted on a node, if you then repair it, data may come streaming in
> from the other nodes. We don't run into this, as our gc is always 0, but
> others may be able to comment.
> ml
> On Fri, Apr 11, 2014 at 11:26 AM, William Oberman <
>> wrote:
>> Yes, I'm using SizeTiered.
>> I totally understand the "mess up the heuristics" issue.  But, I don't
>> understand "You will incur the operational overhead of having to manage
>> compactions if you wish to compact these smaller SSTables".  My
>> understanding is the small tables will still compact.  The problem is that
>> until I have 3 other (by default) tables of the same size as the "big
>> table", it won't be compacted.
>> In my case, this might not be terrible though, right?  To get into the
>> trees, I have 9 nodes with RF=3 and this CF is ~500GB/node.  I deleted like
>> 90-95% of the data, so I expect the data to be 25-50GB after the tombstones
>> are cleared, but call it 50GB.  That means I won't compact this 50GB file
>> until I gather another 150GB (50,50,50,50->200).   But, that's not
>> *horrible*.  Now, if I only deleted 10% of the data, waiting to compact
>> 450GB until I had another 1.3TB would be rough...
>> I think your advice is great for people looking for "normal" answers in
>> the forum, but I don't think my use case is very normal :-)
>> will
>> On Fri, Apr 11, 2014 at 11:12 AM, Mark Reddy <>wrote:
>>> Yes, running nodetool compact (major compaction) creates one large
>>> SSTable. This will mess up the heuristics of the SizeTiered strategy (is
>>> this the compaction strategy you are using?) leading to multiple 'small'
>>> SSTables alongside the single large SSTable, which results in increased
>>> read latency. You will incur the operational overhead of having to manage
>>> compactions if you wish to compact these smaller SSTables. For all these
>>> reasons it is generally advised to stay away from running compactions
>>> manually.
>>> Assuming that this is a production environment and you want to keep
>>> everything running as smoothly as possible I would reduce the gc_grace on
>>> the CF, allow automatic minor compactions to kick in and then increase the
>>> gc_grace once again after the tombstones have been removed.
>>> On Fri, Apr 11, 2014 at 3:44 PM, William Oberman <
>>>> wrote:
>>>> So, if I was impatient and just "wanted to make this happen now", I
>>>> could:
>>>> 1.) Change GCGraceSeconds of the CF to 0
>>>> 2.) run nodetool compact (*)
>>>> 3.) Change GCGraceSeconds of the CF back to 10 days
>>>> Since I have ~900M tombstones, even if I miss a few due to impatience,
>>>> I don't care *that* much as I could re-run my clean up tool against the now
>>>> much smaller CF.
>>>> (*) A long long time ago I seem to recall reading advice about "don't
>>>> ever run nodetool compact", but I can't remember why.  Is there any bad
>>>> long term consequence?  Short term there are several:
>>>> -a heavy operation
>>>> -temporary 2x disk space
>>>> -one big SSTable afterwards
>>>> But moving forward, everything is ok right?
>>>>  CommitLog/MemTable->SStables, minor compactions that merge SSTables,
>>>> etc...  The only flaw I can think of is it will take forever until the
>>>> SSTable minor compactions build up enough to consider including the big
>>>> SSTable in a compaction, making it likely I'll have to self manage
>>>> compactions.
>>>> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <>wrote:
>>>>> Correct, a tombstone will only be removed after gc_grace period has
>>>>> elapsed. The default value is set to 10 days which allows a great deal
>>>>> time for consistency to be achieved prior to deletion. If you are
>>>>> operationally confident that you can achieve consistency via anti-entropy
>>>>> repairs within a shorter period you can always reduce that 10 day interval.
>>>>> Mark
>>>>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>>>>>> wrote:
>>>>>> I'm seeing a lot of articles about a dependency between removing
>>>>>> tombstones and GCGraceSeconds, which might be my problem (I just
>>>>>> and this CF has GCGraceSeconds of 10 days).
>>>>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <
>>>>>>> wrote:
>>>>>>> compaction should take care of it; for me it never worked so
I run
>>>>>>> nodetool compaction on every node; that does it.
>>>>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <
>>>>>>> >:
>>>>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>>>>> nodetool repair, or time (as in just wait)?
>>>>>>>> I had a CF that was more or less storing session information.
>>>>>>>>  After some time, we decided that one piece of this information
>>>>>>>> pointless to track (and was 90%+ of the columns, and in 99%
of those cases
>>>>>>>> was ALL columns for a row).   I wrote a process to remove
all of those
>>>>>>>> columns (which again in a vast majority of cases had the
effect of removing
>>>>>>>> the whole row).
>>>>>>>> This CF had ~1 billion rows, so I expect to be left with
>>>>>>>> rows.  After I did this mass delete, everything was the same
size on disk
>>>>>>>> (which I expected, knowing how tombstoning works).  It wasn't
100% clear to
>>>>>>>> me what to poke to cause compactions to clear the tombstones.
 First I
>>>>>>>> tried nodetool cleanup on a candidate node.  But, afterwards
the disk usage
>>>>>>>> was the same.  Then I tried nodetool repair on that same
node.  But again,
>>>>>>>> disk usage is still the same.  The CF has no snapshots.
>>>>>>>> So, am I misunderstanding something?  Is there another operation
>>>>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair
on one node.
>>>>>>>>  Do I have to run one or the other over all nodes to clear
>>>>>>>> Cassandra 1.2.15 if it matters,
>>>>>>>> Thanks!
>>>>>>>> will

View raw message