Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BEF086141 for ; Thu, 16 Jun 2011 03:27:34 +0000 (UTC) Received: (qmail 9734 invoked by uid 500); 16 Jun 2011 03:27:32 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 9163 invoked by uid 500); 16 Jun 2011 03:27:31 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 9144 invoked by uid 99); 16 Jun 2011 03:27:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 03:27:29 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ryan@twitter.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 03:27:23 +0000 Received: by iyn15 with SMTP id 15so1058783iyn.31 for ; Wed, 15 Jun 2011 20:27:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=twitter.com; s=google; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=D6Ni+8QT3u+u+ja07+h4pii0pnCkQXuhU1Y3v06owA4=; b=Tk1CTPqxCuLeNjsENejwX+46PMag9PApFqY/PfB8mo8/lDD+yT0JCLNmbBS6+AmSSs 4oUNp+k4jcp0bDZdvtNZaqgwOIEZGB6D3Ffm3NqOdyRBhM9avM5XCmwLaxYFr/fjM/lb GH4pMp8PNi4WfLZSkjOjOuKRKfcWCRTj6raAc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=twitter.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=STJxwBJRNv0eZWNIGBGCFUq80fCGKAS9J90Lc/bL6rMa8Fsk9Ew2VMFuTNnNJPi+G9 Eb8fzF+EF6SpC1xTiI7VXX+Kxw/KqpRwujm0GB+VYfgi0IG611EbyfN7I8TqUqoLYv+W 5ZN0nQiwbp07hMEuA+MeBuEHG3KyQirVnfdzA= Received: by 10.42.165.135 with SMTP id k7mr243322icy.64.1308194821124; Wed, 15 Jun 2011 20:27:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.219.8 with HTTP; Wed, 15 Jun 2011 20:26:41 -0700 (PDT) In-Reply-To: References: <2121177157.226992.1306450684707.JavaMail.root@mail-1.01.com> From: Ryan King Date: Wed, 15 Jun 2011 20:26:41 -0700 Message-ID: Subject: Re: Forcing Cassandra to free up some space To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org There's a ticket open for this: https://issues.apache.org/jira/browse/CASSANDRA-2521. Vote on it if you think its important. -ryan On Wed, Jun 15, 2011 at 7:34 PM, Jeffrey Kesselman wrote= : > The GC cleanup approach, if depending on specific objects being GCd, > is fundamentally flawed. > > I brought this up earlier, won't restart that thread. =C2=A0It should be = in > the archives. > > > On Wed, Jun 15, 2011 at 10:17 PM, Terje Marthinussen > wrote: >> Watching this on a node here right now and it sort of shows how bad this= can >> get. >> This node still has 109GB free disk by the way... >> INFO [CompactionExecutor:5] 2011-06-16 09:11:59,164 StorageService.java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:5] 2011-06-16 09:12:23,929 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:5] 2011-06-16 09:12:46,489 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:17:53,299 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:18:17,782 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:18:42,078 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:19:06,984 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:19:32,079 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:19:57,265 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:20:22,706 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:20:47,331 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:21:13,062 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:21:38,288 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:22:03,500 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:22:29,407 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:22:55,577 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:23:20,951 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:23:46,448 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:3] 2011-06-16 09:24:12,030 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [ScheduledTasks:1] 2011-06-16 09:29:29,494 GCInspector.java (= line 128) >> GC for ParNew: 392 ms, 398997776 reclaimed leaving 2334786808 used; max = is >> 10844635136 >> =C2=A0INFO [ScheduledTasks:1] 2011-06-16 09:29:32,831 GCInspector.java (= line 128) >> GC for ParNew: 737 ms, 332336832 reclaimed leaving 2473311448 used; max = is >> 10844635136 >> =C2=A0INFO [CompactionExecutor:6] 2011-06-16 09:48:00,633 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:6] 2011-06-16 09:48:26,119 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:6] 2011-06-16 09:48:49,002 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:6] 2011-06-16 10:10:20,196 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:6] 2011-06-16 10:10:45,322 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:6] 2011-06-16 10:11:07,619 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:7] 2011-06-16 11:01:45,562 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:7] 2011-06-16 11:02:10,236 StorageService= .java >> (line 2071) requesting GC to free disk space >> =C2=A0INFO [CompactionExecutor:7] 2011-06-16 11:05:31,297 StorageService= .java >> (line 2071) requesting GC to free disk space >> If I look at the data dir, I see 46 *Compacted files which makes up an >> additional 137GB of space. >> The oldest of these Compacted files dates back to Jun 16th 01:26. >> If these got deleted, there should actually be enough disk for the node = to >> run a full compaction run if needed. >> Either the GC cleanup tactic is seriously flawed or =C2=A0we have a pote= ntial bug >> keeping references far longer than needed? >> Terje >> >> >> On Wed, Jun 15, 2011 at 11:50 PM, Shotaro Kamio wr= ote: >>> >>> We've encountered the situation that compacted sstable files aren't >>> deleted after node repair. Even when gc is triggered via jmx, it >>> sometimes leaves compacted files. In a case, a lot of files are left. >>> Some files stay more than 10 hours already. There is no guarantee that >>> gc will cleanup all compacted sstable files. >>> >>> We have a great interest on the following ticket. >>> https://issues.apache.org/jira/browse/CASSANDRA-2521 >>> >>> >>> Regards, >>> Shotaro >>> >>> >>> On Fri, May 27, 2011 at 11:27 AM, Jeffrey Kesselman >>> wrote: >>> > Im also not sure that will guarantee all space is cleaned up. =C2=A0I= t >>> > really depends on what you are doing inside Cassandra. =C2=A0If you h= ave >>> > your on garbage collect that is just in some way tied to the gc run, >>> > then it will run when =C2=A0it runs. >>> > >>> > If otoh you are associating records in your storage with specific >>> > objects in memory and using one of the post-mortem hooks (finalize or >>> > PhantomReference) to tell you to clean up that particular record then >>> > its quite possible they wont all get cleaned up. =C2=A0In general hot= spot >>> > does not find and clean every candidate object on every GC run. =C2= =A0It >>> > starts with the easiest/fastest to find and then sees what more it >>> > thinks it needs to do to create enough memory for anticipated near >>> > future needs. >>> > >>> > On Thu, May 26, 2011 at 10:16 PM, Jonathan Ellis >>> > wrote: >>> >> In summary, system.gc works fine unless you've deliberately done >>> >> something like setting the -XX:-DisableExplicitGC flag. >>> >> >>> >> On Thu, May 26, 2011 at 5:58 PM, Konstantin =C2=A0Naryshkin >>> >> wrote: >>> >>> So, in summary, there is no way to predictably and efficiently tell >>> >>> Cassandra to get rid of all of the extra space it is using on disk? >>> >>> >>> >>> ----- Original Message ----- >>> >>> From: "Jeffrey Kesselman" >>> >>> To: user@cassandra.apache.org >>> >>> Sent: Thursday, May 26, 2011 8:57:49 PM >>> >>> Subject: Re: Forcing Cassandra to free up some space >>> >>> >>> >>> Which JVM? =C2=A0Which collector? =C2=A0There have been and continu= e to be many. >>> >>> >>> >>> Hotspot itself supports a number of different collectors with >>> >>> different behaviors. =C2=A0 Many of them do not collect every candi= date on >>> >>> every gc, but merely the easiest ones to find. =C2=A0This is why de= pending >>> >>> on finalizers is a *bad* idea in java code. =C2=A0They may well nev= er get >>> >>> run. =C2=A0(Finalizer is one of a few features the Sun Java team al= ways >>> >>> regretted putting in Java to start with. =C2=A0It has caused quite = a few >>> >>> application problems over the years) >>> >>> >>> >>> The really important thing is that NONE of these behaviors of the >>> >>> colelctors are guaranteed by specification not to change from versi= on >>> >>> to version. =C2=A0Basing your code on non-specified behaviors is a = good way >>> >>> to hit mysterious failures on updates. >>> >>> >>> >>> For instance, in the mid 90s, IBM had a mode of their Vm called >>> >>> "infinite heap." =C2=A0it *never* garbage collected, even if you ca= lled >>> >>> System.gc. =C2=A0Instead it just threw away address space and count= ed on >>> >>> the total memory needs for the life of the program being less then = the >>> >>> total addressable space of the processor. >>> >>> >>> >>> It was *very* fast for certain kinds of applications. >>> >>> >>> >>> Far from being pedantic, not depending on undocumented behavior is >>> >>> simply good engineering. >>> >>> >>> >>> >>> >>> On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis >>> >>> wrote: >>> >>>> I've read the relevant source. While you're pedantically correct r= e >>> >>>> the spec, you're wrong as to what the JVM actually does. >>> >>>> >>> >>>> On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman >>> >>>> wrote: >>> >>>>> Some references... >>> >>>>> >>> >>>>> "An object enters an unreachable state when no more strong >>> >>>>> references >>> >>>>> to it exist. When an object is unreachable, it is a candidate for >>> >>>>> collection. Note the wording: Just because an object is a candida= te >>> >>>>> for collection doesn't mean it will be immediately collected. The >>> >>>>> JVM >>> >>>>> is free to delay collection until there is an immediate need for = the >>> >>>>> memory being consumed by the object." >>> >>>>> >>> >>>>> >>> >>>>> http://java.sun.com/docs/books/performance/1st_edition/html/JPApp= GC.fm.html#998394 >>> >>>>> >>> >>>>> and "Calling the gc method suggests that the Java Virtual Machine >>> >>>>> expend effort toward recycling unused objects" >>> >>>>> >>> >>>>> >>> >>>>> http://download.oracle.com/javase/6/docs/api/java/lang/System.htm= l#gc() >>> >>>>> >>> >>>>> It goes on to say that the VM will make a "best effort", but "bes= t >>> >>>>> effort" is *deliberately* left up to the definition of the gc >>> >>>>> implementor. >>> >>>>> >>> >>>>> I guess you missed the many lectures I have given on this subject >>> >>>>> over >>> >>>>> the years at Java One Conferences.... >>> >>>>> >>> >>>>> On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis >>> >>>>> wrote: >>> >>>>>> It's a common misunderstanding that system.gc is only a suggesti= on; >>> >>>>>> on >>> >>>>>> any VM you're likely to run Cassandra on, System.gc will actuall= y >>> >>>>>> invoke a full collection. >>> >>>>>> >>> >>>>>> On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman >>> >>>>>> wrote: >>> >>>>>>> Actually this is no gaurantee.=C2=A0=C2=A0 Its a common misunde= rstanding >>> >>>>>>> that >>> >>>>>>> System.gc "forces" gc.=C2=A0 It does not. It is a suggestion on= ly. The >>> >>>>>>> vm always >>> >>>>>>> has the option as to when and how much it gcs >>> >>>>>>> >>> >>>>>>> On May 26, 2011 2:51 PM, "Jonathan Ellis" >>> >>>>>>> wrote: >>> >>>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> -- >>> >>>>>> Jonathan Ellis >>> >>>>>> Project Chair, Apache Cassandra >>> >>>>>> co-founder of DataStax, the source for professional Cassandra >>> >>>>>> support >>> >>>>>> http://www.datastax.com >>> >>>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> -- >>> >>>>> It's always darkest just before you are eaten by a grue. >>> >>>>> >>> >>>> >>> >>>> >>> >>>> >>> >>>> -- >>> >>>> Jonathan Ellis >>> >>>> Project Chair, Apache Cassandra >>> >>>> co-founder of DataStax, the source for professional Cassandra supp= ort >>> >>>> http://www.datastax.com >>> >>>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> It's always darkest just before you are eaten by a grue. >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> Jonathan Ellis >>> >> Project Chair, Apache Cassandra >>> >> co-founder of DataStax, the source for professional Cassandra suppor= t >>> >> http://www.datastax.com >>> >> >>> > >>> > >>> > >>> > -- >>> > It's always darkest just before you are eaten by a grue. >>> > >>> >>> >>> >>> -- >>> Shotaro Kamio >> >> > > > > -- > It's always darkest just before you are eaten by a grue. >