incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rosenberry, Eric" <eric.rosenbe...@iovation.com>
Subject Cassandra hardware - balancing CPU/memory/iops/disk space
Date Sat, 06 Mar 2010 07:12:56 GMT
I am looking for advice from others that are further along in deploying Cassandra in production
environments than we are.  I want to know what you are finding your bottlenecks to be.  I
would feel silly purchasing dual processor quad core 2.93ghz Nehalem machines with 192 gigs
of RAM just to find out that the two local SATA disks kept all that CPU and RAM from being
useful (clearly that example would be a dumb).

I need to spec out hardware for an "optimal" Cassandra node (though our read/write characteristics
are not yet fully defined so let's go with an "average" configuration).

My main concern is finding the right balance of:

*         Available CPU

*         RAM amount

*         RAM speed (think Nehalem architecture where memory comes in a few speeds, though
I doubt this is much of a concern as it is mainly dictated by which processor you buy and
how many slots you populate)

*         Total iops available (i.e. number of disks)

*         Total disk space available (depending on the ratio of iops/space deciding on SAS
vs. SATA and various rotational speeds)

My current thinking is 1U boxes with four 3.5 inch disks since that seems to be a readily
available config.  One big question is should I go with a single processor Nehalem system
to go with those four disks, or would two CPU's be useful, and also, how much RAM is appropriate
to match?  I am making the assumption that Cassandra nodes are going to be disk bound as they
must do a random read to answer any given query (i.e. indexes in RAM, but all data lives on
disk?).

The other big decision is what type of hard disks others are finding to provide the optimal
ratio of iops to available space?  SAS or SATA?  And what rotational speed?

Let me throw out here an actual hardware config and feel free to tell me the error of my ways:

*         A SuperMicro SuperServer 6016T-NTRF configured as follows:

o   2.26 ghz E5520 dual processor quad core hyperthreaded Nehalem architecture (this proc
provides a lot of bang for the buck, faster procs get more expensive quickly)

o   Qty 12, 4 gig 1066mhz DIMMS for a total of 48 gigs RAM (the 4 gig DIMMS seem to be the
price sweet spot)

o   Dual on board 1 gigabit NIC's (perhaps one for client connections and the other for cluster
communication?)

o   Dual power supplies (I don't want to lose half my cluster due to a failure on one power
leg)

o   4x 1TB SATA disks (this is a complete SWAG)

o   No RAID controller (all just single individual disks presented to the OS) - Though is
there any down side to using a RAID controller with RAID 0 (perhaps one single disk for the
log for sequential io's, and 3x disks in a stripe for the random io's)

o   The on-board IPMI based OOB controller (so we can kick the boxes remotely if need be)

*         http://www.supermicro.com/products/system/1U/6016/SYS-6016T-NTRF.cfm

I can't help but think the above config has way too much RAM and CPU and not enough iops capacity.
 My understanding is that Cassandra does not cache much in RAM though?

Any thoughts are appreciated.  Thanks.

-Eric
_______________________________________________________________
Eric Rosenberry
Sr. Infrastructure Architect | Chief Bit Plumber


iovation
111 SW Fifth Avenue
Suite 3200
Portland, OR 97204
www.iovation.com<http://www.iovation.com/>

The information contained in this email message may be privileged, confidential and protected
from disclosure. If you are not the intended recipient, any dissemination, distribution or
copying is strictly prohibited. If you think that you have received this email message in
error, please notify the sender by reply email and delete the message and any attachments.

Mime
View raw message