Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <AANLkTi=8uhu6do3ekCgVkgzn65fRjBt1oozi3Tv1xUgS@mail.gmail.com>
References: <AANLkTi=8uhu6do3ekCgVkgzn65fRjBt1oozi3Tv1xUgS@mail.gmail.com>
Date: Wed, 9 Mar 2011 20:48:47 +0100
Message-ID: <AANLkTin5XW4rPa6wzqwv=sZvR_Qi259_jiL_nksd50K4@mail.gmail.com>
Subject: Re: cassandra and G1 gc
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Cc: ruslan usifov <ruslan.usifov@gmail.com>
Content-Type: text/plain; charset=UTF-8

> Does anybody use G1 gc in production? What your impressions?

I don't know if anyone does, but I've tested it very briefly and at
least seen it run well for a while ;) (JDK 1.7 trunk builds)

I do have some comments about expected behavior though. But first, for
those not familiar with G1, the main motivation for using G1 for
Cassandra probably include:

(1) By design all collections are compacting, meaning that
fragmentation will never become an issue like with CMS old space

(2) It's a *lot* easier to tweak than CMS, making deployment easier
and configuration less of a hassle for users. Usually you'd specify
some pause time goals, and that's it. In the case of Cassandra one
would probably still want to force an aggressive trigger for
concurrent marking, but I suspect that's about it.

(3) As a result of (1) and other properties of G1, it has the
potential to completely eliminate even the occasional stop-the-world
full GC even after extended runtime. (Keyword being *potential*.)

Now, first of all, G1 is still immature compared to CMS. But even if
you are in a position that you are willing to trust G1 in some
particular JVM version for your production use, and even if G1
actually does work well with Cassandra in terms of workload, there is
at least one reason why I would urge caution w.r.t. G1 and Cassandra:
The fact that Cassandra uses GC as a means of controlling external
resources - in this case, sstables. With CMS, it's "kinda of" okay
because unreachable objects will be collected on each run of CMS. So
by triggering a full GC when discovering out-of-disk space conditions,
Cassandra can kinda avoid the pitfalls it would otherwise entail
(though confusion/impracticality for the user remains in that sstables
linger for longer than they need to).

With G1, it doesn't do a concurrent mark+sweep like CMS. Instead it
divides the heap into regions that are individually collected. While
there is a concurrent marking process, it is only used to feed data to
the policy which decides which regions to collect. There is no
guarantee or even expectation that for one "cycle" of concurrent
marking, all regions are collection. Individual regions may remain
uncollected for extended periods of time or even perpetually.

So, while it's iffy of Cassandra to begin with to use the GC for
managing external resources (I believe the motivation is less
synchronization complexity and/or overhead involved in making the
determination as to when an sstable can be deleted), G1 brings it much
more into the light than does CMS because one no longer even have the
*soft* guarantee that a CMS cycle will allow them to be freed.

Now.... in addition, I said G1 had the *potential* to eliminate full
GC pauses. I say potential because it's still very possible to have
workloads that cause it to effectively fail.

In particular, whenever I try to stress it I run into problems where
the tracking of inter-pointer references doesn't scale with lots of
inter-region writes. The remembered set scanning costs for regions
thus go *WAY* up to the point where regions are never collected.
Eventually as you rack up more such regions, you end up taking a full
GC anyway. Todd Lipcon seemed to have the very same problem when
trying to mitigate GC issues with HBase. For more details, there's the
"G1GC Full GCs" thread on hotspot-gc-dev/hotspot-gc-use. Unfortunately
I can't provide a link because I haven't found an ML archive that
properly reconstructs threads for that list...

I don't know whether this particular problem would in fact be an issue
for Cassandra. Extended long-term testing would probably be required
under real workloads of different kinds to determine whether G1 seems
suitable in its current condition.

-- 
/ Peter Schuller