incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruchir Jha <ruchir....@gmail.com>
Subject Re: clearing tombstones?
Date Thu, 08 May 2014 13:24:42 GMT
I tried to do this, however the doubling in disk space is not "temporary"
as you state in your note. What am I missing?


On Fri, Apr 11, 2014 at 10:44 AM, William Oberman
<oberman@civicscience.com>wrote:

> So, if I was impatient and just "wanted to make this happen now", I could:
>
> 1.) Change GCGraceSeconds of the CF to 0
> 2.) run nodetool compact (*)
> 3.) Change GCGraceSeconds of the CF back to 10 days
>
> Since I have ~900M tombstones, even if I miss a few due to impatience, I
> don't care *that* much as I could re-run my clean up tool against the now
> much smaller CF.
>
> (*) A long long time ago I seem to recall reading advice about "don't ever
> run nodetool compact", but I can't remember why.  Is there any bad long
> term consequence?  Short term there are several:
> -a heavy operation
> -temporary 2x disk space
> -one big SSTable afterwards
> But moving forward, everything is ok right?  CommitLog/MemTable->SStables,
> minor compactions that merge SSTables, etc...  The only flaw I can think of
> is it will take forever until the SSTable minor compactions build up enough
> to consider including the big SSTable in a compaction, making it likely
> I'll have to self manage compactions.
>
>
>
> On Fri, Apr 11, 2014 at 10:31 AM, Mark Reddy <mark.reddy@boxever.com>wrote:
>
>> Correct, a tombstone will only be removed after gc_grace period has
>> elapsed. The default value is set to 10 days which allows a great deal of
>> time for consistency to be achieved prior to deletion. If you are
>> operationally confident that you can achieve consistency via anti-entropy
>> repairs within a shorter period you can always reduce that 10 day interval.
>>
>>
>> Mark
>>
>>
>> On Fri, Apr 11, 2014 at 3:16 PM, William Oberman <
>> oberman@civicscience.com> wrote:
>>
>>> I'm seeing a lot of articles about a dependency between removing
>>> tombstones and GCGraceSeconds, which might be my problem (I just checked,
>>> and this CF has GCGraceSeconds of 10 days).
>>>
>>>
>>> On Fri, Apr 11, 2014 at 10:10 AM, tommaso barbugli <tbarbugli@gmail.com>wrote:
>>>
>>>> compaction should take care of it; for me it never worked so I run
>>>> nodetool compaction on every node; that does it.
>>>>
>>>>
>>>> 2014-04-11 16:05 GMT+02:00 William Oberman <oberman@civicscience.com>:
>>>>
>>>> I'm wondering what will clear tombstoned rows?  nodetool cleanup,
>>>>> nodetool repair, or time (as in just wait)?
>>>>>
>>>>> I had a CF that was more or less storing session information.  After
>>>>> some time, we decided that one piece of this information was pointless
to
>>>>> track (and was 90%+ of the columns, and in 99% of those cases was ALL
>>>>> columns for a row).   I wrote a process to remove all of those columns
>>>>> (which again in a vast majority of cases had the effect of removing the
>>>>> whole row).
>>>>>
>>>>> This CF had ~1 billion rows, so I expect to be left with ~100m rows.
>>>>>  After I did this mass delete, everything was the same size on disk (which
>>>>> I expected, knowing how tombstoning works).  It wasn't 100% clear to
me
>>>>> what to poke to cause compactions to clear the tombstones.  First I tried
>>>>> nodetool cleanup on a candidate node.  But, afterwards the disk usage
was
>>>>> the same.  Then I tried nodetool repair on that same node.  But again,
disk
>>>>> usage is still the same.  The CF has no snapshots.
>>>>>
>>>>> So, am I misunderstanding something?  Is there another operation to
>>>>> try?  Do I have to "just wait"?  I've only done cleanup/repair on one
node.
>>>>>  Do I have to run one or the other over all nodes to clear tombstones?
>>>>>
>>>>> Cassandra 1.2.15 if it matters,
>>>>>
>>>>> Thanks!
>>>>>
>>>>> will
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>

Mime
View raw message