incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Rosenberry <>
Subject Re: Experiences with Cassandra hardware planning
Date Mon, 25 Oct 2010 20:10:29 GMT
I am going to respond to multiple questions in one email to keep down the
thread insanity:

On Mon, Oct 25, 2010 at 12:39 AM, David Dabbs <> wrote:

>  Sorry, Eric I’m not following you. You’ve set the JVM’s processor
> affinity so it only runs on one of the processors?

My understanding is that Linux will launch a given process on one "node"
(processor in this case) or another and then attempt to allocate memory only
from that node for that process.  If free memory is unavailable on that node
it will assign memory from the other node.  The process scheduler will try
and schedule the process on that node as well.

My knowledge is very limited here, and in fact, most of what I know comes
from this article:

 On Mon, Oct 25, 2010 at 8:25 AM, Edward Capriolo <>

> If reading properly it looks like you used Linux Software Raid on top
> of the SSD devices. Can you talk about this? I would think that even
> with a simple RAID this would drive you CPU high. But it seems you may
> not have other options since SSD RAID cards probably do not exist.

Yes, we are running Linux kernel raid (not LVM).  This is mostly because our
first batch of machines had the SSD's hooked directly to the onboard Intel
ICH10 SATA controller rather than any add in RAID card.  We are only doing
RAID 0 here so I would not expect this to take any CPU to speak of since
it's just doing a mod operator (or something simple) to figure out which
disk the data goes on.  With RAID 0 there is no parity calculation.  Even if
there was more work to be done, there are 8 cores (and 16 virtual processors
when you consider hyperthreading) for that operation to be scheduled on.  We
don't seem to be CPU bound.

That being said, we really should try out the LSI 2008's RAID 0 capability,
but we have not had a chance yet.

On Mon, Oct 25, 2010 at 9:07 AM, Jonathan Ellis <> wrote:

> On Mon, Oct 25, 2010 at 10:25 AM, Edward Capriolo <>
> wrote:
> >> 2. We gave up on using Cassandra's row cache as loading any reasonable
> >> amount of data into the cache would take days/weeks with our tiny row
> size.
> >>  We instead are using file system cache.
> I don't follow the reasoning there.  Row cache or fs cache, it will be
> hot after reading it once, the difference is that doing a read to the
> cached data is much faster from row cache.

Yeah, I would have thought the same.  Benjamin Black actually recommended we
go this route as with our dataset (we have huge numbers of super-tiny rows)
it would take weeks of running for the row cache to become useful.


View raw message