Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates
 209.85.219.220 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=Nl6hro82nGl6MmiZ8iEcN6CWiMCUeN3NSzdGF3QPtYyO82p6YaIVqmRPBkpmq5vLdi
         b6ZkY4SPbjTrx8EssOAkYxZ6LrpEBQ4tX3tJozU4mZh4RX2OnU8BFF/Ys4kvyuTMoh8K
         qsP2A6bKbMRQ4nEfJPglPceEfAfdYWHlKpzOo=
MIME-Version: 1.0
In-Reply-To: <a1625f290912041155g7b124c56tfa4fc559a6ecb62@mail.gmail.com>
References: <a1625f290912031518i79552f45td31f7c9a797007b9@mail.gmail.com>
	<a1625f290912031807p4a75fbe5i3fc36d41f7711d60@mail.gmail.com>
	<e06563880912031809k3149b86fj659c7d10a1f3339b@mail.gmail.com>
	<a1625f290912041051m60e382adl7e44aa71663372c2@mail.gmail.com>
	<e06563880912041101u4cc99a9u8768e1ed296c2175@mail.gmail.com>
	<a1625f290912041104u138e46fcjd9de6131cb818368@mail.gmail.com>
	<a1625f290912041155g7b124c56tfa4fc559a6ecb62@mail.gmail.com>
From: Jonathan Ellis <jbellis@gmail.com>
Date: Fri, 4 Dec 2009 14:34:47 -0600
Message-ID: <e06563880912041234g3928f849qf9fbe9c55e3b256f@mail.gmail.com>
Subject: Re: Removes increasing disk space usage in Cassandra?
To: cassandra-user@incubator.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Are you testing trunk?  If not, you should check that first to see if
it's already fixed.

On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
> Just to be clear what I meant is that I ran the deletions and
> compaction with GCGraceSeconds set to 1 hour, so there was enough time
> for the tombstones to expire.
> Anyway I will try to make a simpler test case to hopefully reproduce
> this, and I will share the code if I can reproduce.
>
> Ray
>
> On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <rrabah@playdom.com> wrote:
>> Hi Jonathan I have changed that to 3600(one hour) based on your
>> recommendation before.
>>
>> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jbellis@gmail.com> wrot=
e:
>>> this is what I was referring to by "the period specified in your config=
 file":
>>>
>>> =A0<!--
>>> =A0 ~ Time to wait before garbage-collection deletion markers. =A0Set t=
his to
>>> =A0 ~ a large enough value that you are confident that the deletion mar=
ker
>>> =A0 ~ will be propagated to all replicas by the time this many seconds =
has
>>> =A0 ~ elapsed, even in the face of hardware failures. =A0The default va=
lue is
>>> =A0 ~ ten days.
>>> =A0-->
>>> =A0<GCGraceSeconds>864000</GCGraceSeconds>
>>>
>>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rrabah@playdom.com> wrote=
:
>>>> I think there might be a bug in the deletion logic. I removed all the
>>>> data on the cluster by running remove on every single key I entered,
>>>> and I run major compaction
>>>> nodeprobe -host hostname compact on a certain node, and after the
>>>> compaction is over, I am left with one data file/ one index file and
>>>> the bloom filter file,
>>>> and they are the same size of data as before I started doing the delet=
es.
>>>>
>>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jbellis@gmail.com> wro=
te:
>>>>> cassandra never modifies data in-place. =A0so it writes tombstones to
>>>>> supress the older writes, and when compaction occurs the data and
>>>>> tombstones get GC'd (after the period specified in your config file).
>>>>>
>>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rrabah@playdom.com> wrot=
e:
>>>>>> Looking at jconsole I see a high number of writes when I do removes,
>>>>>> so I am guessing these are tombstones being written? If that's the
>>>>>> case, is the data being removed and replaced by tombstones? and will
>>>>>> they all be deleted eventually when compaction runs?
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rrabah@playdom.com> wro=
te:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data int=
o
>>>>>>> each node of a 4 node cluster.
>>>>>>> I ran a script that first calls a get on each column inserted follo=
wed
>>>>>>> by a remove. Since I was basically removing every entry
>>>>>>> I inserted before, I expected that the disk space occupied by the
>>>>>>> nodes will go down and eventually become 0. The disk space
>>>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>>>>> node. Am I missing something here?
>>>>>>>
>>>>>>> Thanks a lot for your help
>>>>>>> Ray
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>