Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (nike.apache.org: domain of rrabah@playdom.com designates
 74.125.149.197 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <e06563880912041234g3928f849qf9fbe9c55e3b256f@mail.gmail.com>
References: <a1625f290912031518i79552f45td31f7c9a797007b9@mail.gmail.com>
	 <a1625f290912031807p4a75fbe5i3fc36d41f7711d60@mail.gmail.com>
	 <e06563880912031809k3149b86fj659c7d10a1f3339b@mail.gmail.com>
	 <a1625f290912041051m60e382adl7e44aa71663372c2@mail.gmail.com>
	 <e06563880912041101u4cc99a9u8768e1ed296c2175@mail.gmail.com>
	 <a1625f290912041104u138e46fcjd9de6131cb818368@mail.gmail.com>
	 <a1625f290912041155g7b124c56tfa4fc559a6ecb62@mail.gmail.com>
	 <e06563880912041234g3928f849qf9fbe9c55e3b256f@mail.gmail.com>
Date: Fri, 4 Dec 2009 12:45:09 -0800
Message-ID: <a1625f290912041245y55cee3acl408343153afc41c2@mail.gmail.com>
Subject: Re: Removes increasing disk space usage in Cassandra?
From: Ramzi Rabah <rrabah@playdom.com>
To: cassandra-user@incubator.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I have a two week old version of trunk. Probably need to update it to
latest build.

On Fri, Dec 4, 2009 at 12:34 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> Are you testing trunk? =A0If not, you should check that first to see if
> it's already fixed.
>
> On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
>> Just to be clear what I meant is that I ran the deletions and
>> compaction with GCGraceSeconds set to 1 hour, so there was enough time
>> for the tombstones to expire.
>> Anyway I will try to make a simpler test case to hopefully reproduce
>> this, and I will share the code if I can reproduce.
>>
>> Ray
>>
>> On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <rrabah@playdom.com> wrote:
>>> Hi Jonathan I have changed that to 3600(one hour) based on your
>>> recommendation before.
>>>
>>> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jbellis@gmail.com> wro=
te:
>>>> this is what I was referring to by "the period specified in your confi=
g file":
>>>>
>>>> =A0<!--
>>>> =A0 ~ Time to wait before garbage-collection deletion markers. =A0Set =
this to
>>>> =A0 ~ a large enough value that you are confident that the deletion ma=
rker
>>>> =A0 ~ will be propagated to all replicas by the time this many seconds=
 has
>>>> =A0 ~ elapsed, even in the face of hardware failures. =A0The default v=
alue is
>>>> =A0 ~ ten days.
>>>> =A0-->
>>>> =A0<GCGraceSeconds>864000</GCGraceSeconds>
>>>>
>>>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rrabah@playdom.com> wrot=
e:
>>>>> I think there might be a bug in the deletion logic. I removed all the
>>>>> data on the cluster by running remove on every single key I entered,
>>>>> and I run major compaction
>>>>> nodeprobe -host hostname compact on a certain node, and after the
>>>>> compaction is over, I am left with one data file/ one index file and
>>>>> the bloom filter file,
>>>>> and they are the same size of data as before I started doing the dele=
tes.
>>>>>
>>>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jbellis@gmail.com> wr=
ote:
>>>>>> cassandra never modifies data in-place. =A0so it writes tombstones t=
o
>>>>>> supress the older writes, and when compaction occurs the data and
>>>>>> tombstones get GC'd (after the period specified in your config file)=
.
>>>>>>
>>>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rrabah@playdom.com> wro=
te:
>>>>>>> Looking at jconsole I see a high number of writes when I do removes=
,
>>>>>>> so I am guessing these are tombstones being written? If that's the
>>>>>>> case, is the data being removed and replaced by tombstones? and wil=
l
>>>>>>> they all be deleted eventually when compaction runs?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rrabah@playdom.com> wr=
ote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data in=
to
>>>>>>>> each node of a 4 node cluster.
>>>>>>>> I ran a script that first calls a get on each column inserted foll=
owed
>>>>>>>> by a remove. Since I was basically removing every entry
>>>>>>>> I inserted before, I expected that the disk space occupied by the
>>>>>>>> nodes will go down and eventually become 0. The disk space
>>>>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>>>>>> node. Am I missing something here?
>>>>>>>>
>>>>>>>> Thanks a lot for your help
>>>>>>>> Ray
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>