incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramzi Rabah <rra...@playdom.com>
Subject Re: Removes increasing disk space usage in Cassandra?
Date Fri, 04 Dec 2009 23:32:04 GMT
Starting with fresh directories with no data and trying to do simple
inserts, I could not reproduce it *sigh*. Nothing is simple :(, so I
decided to dig deeper into the code.

I was looking at the code for compaction, and this is a very noob
concern, so please bare with me if I'm way off, this code is all new
to me. When we are doing compactions during the normal course of
cassandra, we call:

            for (List<SSTableReader> sstables :
getCompactionBuckets(ssTables_, 50L * 1024L * 1024L))
            {
                if (sstables.size() < minThreshold)
                {
                    continue;
                }
                other wise docompactions...

where getCompactionBuckets puts in buckets very small files, or files
that are 0.5-1.5 of each other's sizes. It will only compact those if
they are >= minimum threshold which is 4 by default.
So far so good. Now how about this scenario, I have an old entry that
I inserted long time ago and that was compacted into a 75MB file.
There are fewer 75MB files than 4. I do many deletes, and I end with 4
extra sstable files filled with tombstones, each about 300 MB large.
These 4 files are compacted together and in the compaction code, if
the tombstone is there we don't copy it over to the new file. Now
since we did not compact the 75MB files, but we compacted the
tombstone files, doesn't that leave us with the tombstone gone, but
the data still intact in the 75MB file? Or did I miss in the code the
part where the original data is removed. Now if we compacted all the
files together I don't think that would be a problem, but since we
only compact 4, wouldn't that potentially leave data uncleaned?

Again sorry if I am way off.

Thanks
Ray




On Fri, Dec 4, 2009 at 12:52 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> Okay, in that case it doesn't hurt to update just in case but I think
> you're going to need that test case. :)
>
> On Fri, Dec 4, 2009 at 2:45 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
>> I have a two week old version of trunk. Probably need to update it to
>> latest build.
>>
>> On Fri, Dec 4, 2009 at 12:34 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>> Are you testing trunk?  If not, you should check that first to see if
>>> it's already fixed.
>>>
>>> On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah <rrabah@playdom.com> wrote:
>>>> Just to be clear what I meant is that I ran the deletions and
>>>> compaction with GCGraceSeconds set to 1 hour, so there was enough time
>>>> for the tombstones to expire.
>>>> Anyway I will try to make a simpler test case to hopefully reproduce
>>>> this, and I will share the code if I can reproduce.
>>>>
>>>> Ray
>>>>
>>>> On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <rrabah@playdom.com> wrote:
>>>>> Hi Jonathan I have changed that to 3600(one hour) based on your
>>>>> recommendation before.
>>>>>
>>>>> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>> this is what I was referring to by "the period specified in your
config file":
>>>>>>
>>>>>>  <!--
>>>>>>   ~ Time to wait before garbage-collection deletion markers.  Set
this to
>>>>>>   ~ a large enough value that you are confident that the deletion
marker
>>>>>>   ~ will be propagated to all replicas by the time this many seconds
has
>>>>>>   ~ elapsed, even in the face of hardware failures.  The default
value is
>>>>>>   ~ ten days.
>>>>>>  -->
>>>>>>  <GCGraceSeconds>864000</GCGraceSeconds>
>>>>>>
>>>>>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>> I think there might be a bug in the deletion logic. I removed
all the
>>>>>>> data on the cluster by running remove on every single key I entered,
>>>>>>> and I run major compaction
>>>>>>> nodeprobe -host hostname compact on a certain node, and after
the
>>>>>>> compaction is over, I am left with one data file/ one index file
and
>>>>>>> the bloom filter file,
>>>>>>> and they are the same size of data as before I started doing
the deletes.
>>>>>>>
>>>>>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <jbellis@gmail.com>
wrote:
>>>>>>>> cassandra never modifies data in-place.  so it writes tombstones
to
>>>>>>>> supress the older writes, and when compaction occurs the
data and
>>>>>>>> tombstones get GC'd (after the period specified in your config
file).
>>>>>>>>
>>>>>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>>>> Looking at jconsole I see a high number of writes when
I do removes,
>>>>>>>>> so I am guessing these are tombstones being written?
If that's the
>>>>>>>>> case, is the data being removed and replaced by tombstones?
and will
>>>>>>>>> they all be deleted eventually when compaction runs?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <rrabah@playdom.com>
wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I ran a test where I inserted about 1.2 Gigabytes
worth of data into
>>>>>>>>>> each node of a 4 node cluster.
>>>>>>>>>> I ran a script that first calls a get on each column
inserted followed
>>>>>>>>>> by a remove. Since I was basically removing every
entry
>>>>>>>>>> I inserted before, I expected that the disk space
occupied by the
>>>>>>>>>> nodes will go down and eventually become 0. The disk
space
>>>>>>>>>> actually goes up when I do the bulk removes to about
1.8 gigs per
>>>>>>>>>> node. Am I missing something here?
>>>>>>>>>>
>>>>>>>>>> Thanks a lot for your help
>>>>>>>>>> Ray
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message