incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Removes increasing disk space usage in Cassandra?
Date Fri, 04 Dec 2009 20:52:57 GMT
Okay, in that case it doesn't hurt to update just in case but I think
you're going to need that test case. :)

On Fri, Dec 4, 2009 at 2:45 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
> I have a two week old version of trunk. Probably need to update it to
> latest build.
>
> On Fri, Dec 4, 2009 at 12:34 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>> Are you testing trunk?  If not, you should check that first to see if
>> it's already fixed.
>>
>> On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
>>> Just to be clear what I meant is that I ran the deletions and
>>> compaction with GCGraceSeconds set to 1 hour, so there was enough time
>>> for the tombstones to expire.
>>> Anyway I will try to make a simpler test case to hopefully reproduce
>>> this, and I will share the code if I can reproduce.
>>>
>>> Ray
>>>
>>> On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <rrabah@playdom.com> wrote:
>>>> Hi Jonathan I have changed that to 3600(one hour) based on your
>>>> recommendation before.
>>>>
>>>> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>> this is what I was referring to by "the period specified in your config
file":
>>>>>
>>>>>  <!--
>>>>>   ~ Time to wait before garbage-collection deletion markers.  Set this
to
>>>>>   ~ a large enough value that you are confident that the deletion marker
>>>>>   ~ will be propagated to all replicas by the time this many seconds
has
>>>>>   ~ elapsed, even in the face of hardware failures.  The default value
is
>>>>>   ~ ten days.
>>>>>  -->
>>>>>  <GCGraceSeconds>864000</GCGraceSeconds>
>>>>>
>>>>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>> I think there might be a bug in the deletion logic. I removed all
the
>>>>>> data on the cluster by running remove on every single key I entered,
>>>>>> and I run major compaction
>>>>>> nodeprobe -host hostname compact on a certain node, and after the
>>>>>> compaction is over, I am left with one data file/ one index file
and
>>>>>> the bloom filter file,
>>>>>> and they are the same size of data as before I started doing the
deletes.
>>>>>>
>>>>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>>> cassandra never modifies data in-place.  so it writes tombstones
to
>>>>>>> supress the older writes, and when compaction occurs the data
and
>>>>>>> tombstones get GC'd (after the period specified in your config
file).
>>>>>>>
>>>>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>>> Looking at jconsole I see a high number of writes when I
do removes,
>>>>>>>> so I am guessing these are tombstones being written? If that's
the
>>>>>>>> case, is the data being removed and replaced by tombstones?
and will
>>>>>>>> they all be deleted eventually when compaction runs?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth
of data into
>>>>>>>>> each node of a 4 node cluster.
>>>>>>>>> I ran a script that first calls a get on each column
inserted followed
>>>>>>>>> by a remove. Since I was basically removing every entry
>>>>>>>>> I inserted before, I expected that the disk space occupied
by the
>>>>>>>>> nodes will go down and eventually become 0. The disk
space
>>>>>>>>> actually goes up when I do the bulk removes to about
1.8 gigs per
>>>>>>>>> node. Am I missing something here?
>>>>>>>>>
>>>>>>>>> Thanks a lot for your help
>>>>>>>>> Ray
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message