cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terje Marthinussen <tmarthinus...@gmail.com>
Subject Re: Forcing Cassandra to free up some space
Date Wed, 15 Jun 2011 15:48:15 GMT
Even if the gc call cleaned all files, it is not really acceptable on a
decent sized cluster due to the impact full gc has on performance.
Especially non-needed ones.

The delay in file deletion can also at times make it hard to see how much
spare disk you actually have.

We easily see 100% increase in disk use which extends for long periods of
time before anything gets cleaned up. This can be quite misleading and I
believe on a couple of occasions we seen short term full disk scenarios
during testing as a result of cleanup not happening entirely when it
should...

Terje

On Wed, Jun 15, 2011 at 11:50 PM, Shotaro Kamio <kamioshot@gmail.com> wrote:

> We've encountered the situation that compacted sstable files aren't
> deleted after node repair. Even when gc is triggered via jmx, it
> sometimes leaves compacted files. In a case, a lot of files are left.
> Some files stay more than 10 hours already. There is no guarantee that
> gc will cleanup all compacted sstable files.
>
> We have a great interest on the following ticket.
> https://issues.apache.org/jira/browse/CASSANDRA-2521
>
>
> Regards,
> Shotaro
>
>
> On Fri, May 27, 2011 at 11:27 AM, Jeffrey Kesselman <jeffpk@gmail.com>
> wrote:
> > Im also not sure that will guarantee all space is cleaned up.  It
> > really depends on what you are doing inside Cassandra.  If you have
> > your on garbage collect that is just in some way tied to the gc run,
> > then it will run when  it runs.
> >
> > If otoh you are associating records in your storage with specific
> > objects in memory and using one of the post-mortem hooks (finalize or
> > PhantomReference) to tell you to clean up that particular record then
> > its quite possible they wont all get cleaned up.  In general hotspot
> > does not find and clean every candidate object on every GC run.  It
> > starts with the easiest/fastest to find and then sees what more it
> > thinks it needs to do to create enough memory for anticipated near
> > future needs.
> >
> > On Thu, May 26, 2011 at 10:16 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >> In summary, system.gc works fine unless you've deliberately done
> >> something like setting the -XX:-DisableExplicitGC flag.
> >>
> >> On Thu, May 26, 2011 at 5:58 PM, Konstantin  Naryshkin
> >> <konstantinn@a-bb.net> wrote:
> >>> So, in summary, there is no way to predictably and efficiently tell
> Cassandra to get rid of all of the extra space it is using on disk?
> >>>
> >>> ----- Original Message -----
> >>> From: "Jeffrey Kesselman" <jeffpk@gmail.com>
> >>> To: user@cassandra.apache.org
> >>> Sent: Thursday, May 26, 2011 8:57:49 PM
> >>> Subject: Re: Forcing Cassandra to free up some space
> >>>
> >>> Which JVM?  Which collector?  There have been and continue to be many.
> >>>
> >>> Hotspot itself supports a number of different collectors with
> >>> different behaviors.   Many of them do not collect every candidate on
> >>> every gc, but merely the easiest ones to find.  This is why depending
> >>> on finalizers is a *bad* idea in java code.  They may well never get
> >>> run.  (Finalizer is one of a few features the Sun Java team always
> >>> regretted putting in Java to start with.  It has caused quite a few
> >>> application problems over the years)
> >>>
> >>> The really important thing is that NONE of these behaviors of the
> >>> colelctors are guaranteed by specification not to change from version
> >>> to version.  Basing your code on non-specified behaviors is a good way
> >>> to hit mysterious failures on updates.
> >>>
> >>> For instance, in the mid 90s, IBM had a mode of their Vm called
> >>> "infinite heap."  it *never* garbage collected, even if you called
> >>> System.gc.  Instead it just threw away address space and counted on
> >>> the total memory needs for the life of the program being less then the
> >>> total addressable space of the processor.
> >>>
> >>> It was *very* fast for certain kinds of applications.
> >>>
> >>> Far from being pedantic, not depending on undocumented behavior is
> >>> simply good engineering.
> >>>
> >>>
> >>> On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>>> I've read the relevant source. While you're pedantically correct re
> >>>> the spec, you're wrong as to what the JVM actually does.
> >>>>
> >>>> On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman <jeffpk@gmail.com>
> wrote:
> >>>>> Some references...
> >>>>>
> >>>>> "An object enters an unreachable state when no more strong references
> >>>>> to it exist. When an object is unreachable, it is a candidate for
> >>>>> collection. Note the wording: Just because an object is a candidate
> >>>>> for collection doesn't mean it will be immediately collected. The
JVM
> >>>>> is free to delay collection until there is an immediate need for
the
> >>>>> memory being consumed by the object."
> >>>>>
> >>>>>
> http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394
> >>>>>
> >>>>> and "Calling the gc method suggests that the Java Virtual Machine
> >>>>> expend effort toward recycling unused objects"
> >>>>>
> >>>>>
> http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc()
> >>>>>
> >>>>> It goes on to say that the VM will make a "best effort", but "best
> >>>>> effort" is *deliberately* left up to the definition of the gc
> >>>>> implementor.
> >>>>>
> >>>>> I guess you missed the many lectures I have given on this subject
> over
> >>>>> the years at Java One Conferences....
> >>>>>
> >>>>> On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis <jbellis@gmail.com>
> wrote:
> >>>>>> It's a common misunderstanding that system.gc is only a suggestion;
> on
> >>>>>> any VM you're likely to run Cassandra on, System.gc will actually
> >>>>>> invoke a full collection.
> >>>>>>
> >>>>>> On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman <
> jeffpk@gmail.com> wrote:
> >>>>>>> Actually this is no gaurantee.   Its a common misunderstanding
that
> >>>>>>> System.gc "forces" gc.  It does not. It is a suggestion
only. The
> vm always
> >>>>>>> has the option as to when and how much it gcs
> >>>>>>>
> >>>>>>> On May 26, 2011 2:51 PM, "Jonathan Ellis" <jbellis@gmail.com>
> wrote:
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Jonathan Ellis
> >>>>>> Project Chair, Apache Cassandra
> >>>>>> co-founder of DataStax, the source for professional Cassandra
> support
> >>>>>> http://www.datastax.com
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> It's always darkest just before you are eaten by a grue.
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Jonathan Ellis
> >>>> Project Chair, Apache Cassandra
> >>>> co-founder of DataStax, the source for professional Cassandra support
> >>>> http://www.datastax.com
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> It's always darkest just before you are eaten by a grue.
> >>>
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of DataStax, the source for professional Cassandra support
> >> http://www.datastax.com
> >>
> >
> >
> >
> > --
> > It's always darkest just before you are eaten by a grue.
> >
>
>
>
> --
> Shotaro Kamio
>

Mime
View raw message