Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of chensheng2010@gmail.com
 designates 209.85.210.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=Mzso+Dn+Gdgv6T10KUL6pjpVXv5gP/8QtJxMiH+1PYPRbCG6aJAeyIbI5bze6zZUUQ
         DX/JHuJxQLcKGrlS4+JIk1i/kIJuyEKtouYbKfb37yGm4yQM21NeahVvQ0zbL9Es4aOO
         y/3B/pNAHzFba8t41Wj/E2Nu7y96fvngj0F5E=
MIME-Version: 1.0
In-Reply-To: <B4203F63-CCF2-46ED-BF3E-93E51DFDB10D@thelastpickle.com>
References: <AANLkTimhufyJ_fRpnw-6es26za50QRoypdZ+OR3Q+VOw@mail.gmail.com>
	<AANLkTin7WLTYJtw3oaSCYCd9bkTNu=ZNur9s9exHHfee@mail.gmail.com>
	<4D91BB06.3080806@hiramoto.org>
	<AANLkTiky4ok4_E4V4Zk9+n4NoGsEZO1VZjKAyGf6Qy1C@mail.gmail.com>
	<B4203F63-CCF2-46ED-BF3E-93E51DFDB10D@thelastpickle.com>
Date: Thu, 31 Mar 2011 14:39:20 +0800
Message-ID: <AANLkTinO3gkEoYS-M5B2bgqkv9t6pm=OZeZPC+Kghqk7@mail.gmail.com>
Subject: Re: Compaction doubles disk space
From: Sheng Chen <chensheng2010@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=90e6ba613b94a23c0f049fc18ddf

--90e6ba613b94a23c0f049fc18ddf
Content-Type: text/plain; charset=ISO-8859-1

It really helps. Thank you very much.

Sheng

2011/3/30 aaron morton <aaron@thelastpickle.com>

> When a compaction need to write a file cassandra will try to find a place
> to put the new file, based on an estimate of it's size. If it cannot find
> enough space it will trigger a GC which will delete any previously compacted
> and so unneeded SSTables. The same thing will happen when a new SSTable
> needs to be written to disk.
>
> Minor Compaction groups the SSTables on disk into buckets of similar sizes
> (http://wiki.apache.org/cassandra/MemtableSSTable) each bucket is
> processed in it's own compaction task. Under 0.7 compaction is single
> threaded and when each compaction task starts it will try to find space on
> disk and if necessary trigger GC to free space.
>
> SSTables are immutable on disk, compaction cannot delete data from them as
> they are also used to serve read requests at the same time. To do so would
> require locking around (regions of) the file.
>
> Also as far as I understand we cannot immediately delete files because
> other operations (including repair) may be using them. The data in the pre
> compacted files is just as correct as the data in the compacted file, it's
> just more compact. So the easiest thing to do is let the JVM sort out if
> anything else is using them.
>
> Perhaps it could be improved by actively tracking which files are in use so
> they may be deleted quicker. But right so long as unused space is freed when
> needed it's working as designed AFAIK.
>
> Thats my understanding, hope it helps explain why it works that way.
> Aaron
>
> On 30 Mar 2011, at 13:32, Sheng Chen wrote:
>
> Yes.
> I think at least we can remove the tombstones for each sstable first, and
> then do the merge.
>
> 2011/3/29 Karl Hiramoto <karl@hiramoto.org>
>
>> Would it be possible to improve the current compaction disk space issue by
>>  compacting one only a few SSTables at a time then imediately deleting the
>> old one?  Looking at the logs it seems like deletions of old SSTables are
>> taking longer than necessary.
>>
>> --
>> Karl
>>
>
>
>

--90e6ba613b94a23c0f049fc18ddf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

It really helps. Thank you very much.<div><br></div><div>Sheng<br><br><div =
class=3D"gmail_quote">2011/3/30 aaron morton <span dir=3D"ltr">&lt;<a href=
=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.com</a>&gt;</span><=
br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left=
:1px #ccc solid;padding-left:1ex;">
<div style=3D"word-wrap:break-word">When a compaction need to write a file =
cassandra will try to find a place to put the new file, based on an estimat=
e of it&#39;s size. If it cannot find enough space it will trigger a GC whi=
ch will delete any previously compacted and so unneeded SSTables. The same =
thing will happen when a new SSTable needs to be written to disk.=A0<div>
<br></div><div>Minor Compaction groups the SSTables on disk into buckets of=
 similar sizes (<a href=3D"http://wiki.apache.org/cassandra/MemtableSSTable=
" target=3D"_blank">http://wiki.apache.org/cassandra/MemtableSSTable</a>) e=
ach bucket is processed in it&#39;s own compaction task. Under 0.7 compacti=
on is single threaded and when each compaction task starts it will try to f=
ind space on disk and if necessary trigger GC to free space.=A0</div>
<div>=A0</div><div>SSTables are immutable on disk, compaction cannot delete=
 data from them as they are also used to serve read requests at the same ti=
me. To do so would require locking around (regions of) the file. =A0</div><=
div>
<br></div><div>Also as far as I understand we cannot immediately delete fil=
es because other operations (including repair) may be using them. The data =
in the pre compacted files is just as correct as the data in the compacted =
file, it&#39;s just more compact. So the easiest thing to do is let the JVM=
 sort out if anything else is using them.=A0</div>
<div><br></div><div>Perhaps it could be improved by actively tracking which=
 files are in use so they may be deleted quicker. But right so long as unus=
ed space is freed when needed it&#39;s working as designed AFAIK.=A0</div>
<div><br></div><div>Thats my understanding, hope it helps explain why it wo=
rks that way.=A0</div><div>Aaron</div><div><div></div><div class=3D"h5"><di=
v><br></div><div><div><div>On 30 Mar 2011, at 13:32, Sheng Chen wrote:</div=
>
<br><blockquote type=3D"cite">Yes.<div>I think at least we can remove the t=
ombstones for each sstable first, and then do the merge.<br><br><div class=
=3D"gmail_quote">2011/3/29 Karl Hiramoto <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:karl@hiramoto.org" target=3D"_blank">karl@hiramoto.org</a>&gt;</span>=
<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Would it be possible to improve the current =
compaction disk space issue by =A0compacting one only a few SSTables at a t=
ime then imediately deleting the old one? =A0Looking at the logs it seems l=
ike deletions of old SSTables are taking longer than necessary.<br>


<br>
--<br><font color=3D"#888888">
Karl<br>
</font></blockquote></div><br></div>
</blockquote></div><br></div></div></div></div></blockquote></div><br></div=
>

--90e6ba613b94a23c0f049fc18ddf--