Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <AANLkTilzAHGPzK3I_CIe-ke_l0OdUX-Ok63dbfi2mJyv@mail.gmail.com>
References: <AANLkTikiJsYnbgAMgiDr6LUPhDU8WPtK6fJyrY6yX-6h@mail.gmail.com>
	<AANLkTikx8=oHS1a+vqeNGgcE7mnM_3iUr-S_yGrXBsN+@mail.gmail.com>
	<AANLkTilzAHGPzK3I_CIe-ke_l0OdUX-Ok63dbfi2mJyv@mail.gmail.com>
Date: Mon, 26 Jul 2010 23:04:29 +0200
Message-ID: <AANLkTi=vk8qC=hHYYckkSv8uFFFUPCA79L7tXJVJ_sPS@mail.gmail.com>
Subject: Re: Key Caching
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8

> If the cache is stored in the heap, how big can the heap be made
> realistically on a 24gb ram machine? I am a java newbie but I have read
> concerns with going over 8gb for the heap as the GC can be too painful/take
> too long. I already have seen timeout issues (node is dead errors) under
> load during GC or compaction. Can/should the heap be set to 16gb with 24gb
> ram?

I have never run Cassandra in production with such a large heap, so
I'll let others comment on practical experience with that.

In general however, with the JVM and the CMS garbage collector (which
is enabled by default with Cassandra), having a large heap is not
necessarily a problem depending on the application's workload.

In terms of GC:s taking too long - with the default throughput
collector used by the JVM you will tend to see the longest pause times
scale roughly linearly with heap size. Most pauses would still be
short (these are what is known as young generation collections), but
periodically a so-called full collection is done. WIth the throughput
collector, this implies stopping all Java threads while the *entire*
Java heap is garbage collected.

WIth the CMS (Concurrent Mark/Sweep) collector the intent is that the
periodic scans of the entire Java heap are done concurrently with the
application without pausing it. Fallback to full stop-the-world
garbage collections can still happen if CMS fails to complete such
work fast enough, in which case tweaking of garbage collection
settings may be required.

One thing to consider in any case is how much memory you actually
need; the more you give to the JVM, the less there is left for the OS
to cache file contents. If for example your true working set in
cassandra is, to grab a random number, 3 GB and you set the heap
sizeto 15 GB - now you're wasting a lot of memory by allowing the JVM
to postpone GC until it starts approaching the 15 GB mark. This is
actually good (normally) for overall GC throughput, but not
necessarily good overall for something like cassandra where there is a
direct trade-off with cache eviction in the operating system possibly
causing additional I/O.

Personally I'd be very interested in hearing any stories about running
cassandra nodes with 10+ gig heap sizes, and how well it has worked.
My gut feeling is that it should work reasonable well, but I have no
evidence of that and I may very well be wrong. Anyone?

(On a related noted, my limited testing with the G1 collector with
Cassandra has indicated it works pretty well. Though I'm concerned
with the weak ref finalization based cleanup of compacted sstables
since the G1 collector will be much less deterministic in when a
particular object may be collected. Has anyone deployed Cassandra with
G1 on very large heaps under real load?)

-- 
/ Peter Schuller