Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 4526 invoked from network); 31 Mar 2011 06:39:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Mar 2011 06:39:50 -0000 Received: (qmail 88161 invoked by uid 500); 31 Mar 2011 06:39:48 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 88107 invoked by uid 500); 31 Mar 2011 06:39:48 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 88094 invoked by uid 99); 31 Mar 2011 06:39:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 06:39:46 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chensheng2010@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 06:39:41 +0000 Received: by iye19 with SMTP id 19so2354022iye.31 for ; Wed, 30 Mar 2011 23:39:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=6mdTkqQX4DYDMqPMlZLSiMSJtXbXpuTmYUK0CzJJDFg=; b=j8jg6QjllNxV8dengJvhTE+OaKHk65jt1tKMJOc/WByGpXKlZq8KvMDX0OdI/8hYtW gErCCdvgDVejtnhaqe/1P+2gk6PKWJEEMnCpJ40ClDHv2ogrrXuVRlhm5/Pj9u7u8ke1 iZzahd0grWQiGP34amexLHLcz0Av1QActKm1Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=Mzso+Dn+Gdgv6T10KUL6pjpVXv5gP/8QtJxMiH+1PYPRbCG6aJAeyIbI5bze6zZUUQ DX/JHuJxQLcKGrlS4+JIk1i/kIJuyEKtouYbKfb37yGm4yQM21NeahVvQ0zbL9Es4aOO y/3B/pNAHzFba8t41Wj/E2Nu7y96fvngj0F5E= MIME-Version: 1.0 Received: by 10.42.156.199 with SMTP id a7mr2524909icx.417.1301553560566; Wed, 30 Mar 2011 23:39:20 -0700 (PDT) Received: by 10.42.164.137 with HTTP; Wed, 30 Mar 2011 23:39:20 -0700 (PDT) In-Reply-To: References: <4D91BB06.3080806@hiramoto.org> Date: Thu, 31 Mar 2011 14:39:20 +0800 Message-ID: Subject: Re: Compaction doubles disk space From: Sheng Chen To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=90e6ba613b94a23c0f049fc18ddf --90e6ba613b94a23c0f049fc18ddf Content-Type: text/plain; charset=ISO-8859-1 It really helps. Thank you very much. Sheng 2011/3/30 aaron morton > When a compaction need to write a file cassandra will try to find a place > to put the new file, based on an estimate of it's size. If it cannot find > enough space it will trigger a GC which will delete any previously compacted > and so unneeded SSTables. The same thing will happen when a new SSTable > needs to be written to disk. > > Minor Compaction groups the SSTables on disk into buckets of similar sizes > (http://wiki.apache.org/cassandra/MemtableSSTable) each bucket is > processed in it's own compaction task. Under 0.7 compaction is single > threaded and when each compaction task starts it will try to find space on > disk and if necessary trigger GC to free space. > > SSTables are immutable on disk, compaction cannot delete data from them as > they are also used to serve read requests at the same time. To do so would > require locking around (regions of) the file. > > Also as far as I understand we cannot immediately delete files because > other operations (including repair) may be using them. The data in the pre > compacted files is just as correct as the data in the compacted file, it's > just more compact. So the easiest thing to do is let the JVM sort out if > anything else is using them. > > Perhaps it could be improved by actively tracking which files are in use so > they may be deleted quicker. But right so long as unused space is freed when > needed it's working as designed AFAIK. > > Thats my understanding, hope it helps explain why it works that way. > Aaron > > On 30 Mar 2011, at 13:32, Sheng Chen wrote: > > Yes. > I think at least we can remove the tombstones for each sstable first, and > then do the merge. > > 2011/3/29 Karl Hiramoto > >> Would it be possible to improve the current compaction disk space issue by >> compacting one only a few SSTables at a time then imediately deleting the >> old one? Looking at the logs it seems like deletions of old SSTables are >> taking longer than necessary. >> >> -- >> Karl >> > > > --90e6ba613b94a23c0f049fc18ddf Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable It really helps. Thank you very much.

Sheng

2011/3/30 aaron morton <aaron@thelastpickle.com><= br>
When a compaction need to write a file = cassandra will try to find a place to put the new file, based on an estimat= e of it's size. If it cannot find enough space it will trigger a GC whi= ch will delete any previously compacted and so unneeded SSTables. The same = thing will happen when a new SSTable needs to be written to disk.=A0

Minor Compaction groups the SSTables on disk into buckets of= similar sizes (http://wiki.apache.org/cassandra/MemtableSSTable) e= ach bucket is processed in it's own compaction task. Under 0.7 compacti= on is single threaded and when each compaction task starts it will try to f= ind space on disk and if necessary trigger GC to free space.=A0
=A0
SSTables are immutable on disk, compaction cannot delete= data from them as they are also used to serve read requests at the same ti= me. To do so would require locking around (regions of) the file. =A0
<= div>
Also as far as I understand we cannot immediately delete fil= es because other operations (including repair) may be using them. The data = in the pre compacted files is just as correct as the data in the compacted = file, it's just more compact. So the easiest thing to do is let the JVM= sort out if anything else is using them.=A0

Perhaps it could be improved by actively tracking which= files are in use so they may be deleted quicker. But right so long as unus= ed space is freed when needed it's working as designed AFAIK.=A0

Thats my understanding, hope it helps explain why it wo= rks that way.=A0
Aaron

On 30 Mar 2011, at 13:32, Sheng Chen wrote:
Yes.
I think at least we can remove the t= ombstones for each sstable first, and then do the merge.

2011/3/29 Karl Hiramoto <karl@hiramoto.org>=
Would it be possible to improve the current = compaction disk space issue by =A0compacting one only a few SSTables at a t= ime then imediately deleting the old one? =A0Looking at the logs it seems l= ike deletions of old SSTables are taking longer than necessary.

--
Karl



--90e6ba613b94a23c0f049fc18ddf--