cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Oberman <ober...@civicscience.com>
Subject Re: clearing tombstones?
Date Fri, 11 Apr 2014 14:44:48 GMT
So, if I was impatient and just "wanted to make this happen now", I could:

1.) Change GCGraceSeconds of the CF to 0
2.) run nodetool compact (*)
3.) Change GCGraceSeconds of the CF back to 10 days

Since I have ~900M tombstones, even if I miss a few due to impatience, I
don't care *that* much as I could re-run my clean up tool against the now
much smaller CF.

(*) A long long time ago I seem to recall reading advice about "don't ever
run nodetool compact", but I can't remember why.  Is there any bad long
term consequence?  Short term there are several:
-a heavy operation
-temporary 2x disk space
-one big SSTable afterwards
But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
minor compactions that merge SSTables, etc...  The only flaw I can think of
is it will take forever until the SSTable minor compactions build up enough
to consider including the big SSTable in a compaction, making it likely
I'll have to self manage compactions.



On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <mark.reddy@boxever.com> wrote:

> Correct, a tombstone will only be removed after gc_grace period has
> elapsed. The default value is set to 10 days which allows a great deal of
> time for consistency to be achieved prior to deletion. If you are
> operationally confident that you can achieve consistency via anti-entropy
> repairs within a shorter period you can always reduce that 10 day interval.
>
>
> Mark
>
>
> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <oberman@civicscience.com
> > wrote:
>
>> I'm seeing a lot of articles about a dependency between removing
>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>> and this CF has GCGraceSeconds of 10 days).
>>
>>
>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tbarbugli@gmail.com>wrote:
>>
>>> compaction should take care of it; for me it never worked so I run
>>> nodetool compaction on every node; that does it.
>>>
>>>
>>> 2014-04-11 16:05 GMT+02:00 William Oberman <oberman@civicscience.com>:
>>>
>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>> nodetool repair, or time (as in just wait)?
>>>>
>>>> I had a CF that was more or less storing session information.  After
>>>> some time, we decided that one piece of this information was pointless to
>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>> columns for a row).   I wrote a process to remove all of those columns
>>>> (which again in a vast majority of cases had the effect of removing the
>>>> whole row).
>>>>
>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>  After I did this mass delete, everything was the same size on disk (which
>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to me
>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage was
>>>> the same.  Then I tried nodetool repair on that same node.  But again, disk
>>>> usage is still the same.  The CF has no snapshots.
>>>>
>>>> So, am I misunderstanding something?  Is there another operation to
>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one node.
>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>
>>>> Cassandra 1.2.15 if it matters,
>>>>
>>>> Thanks!
>>>>
>>>> will
>>>>
>>>
>>>
>>
>>
>>
>

Mime
View raw message