incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "B. Todd Burruss" <bburr...@real.com>
Subject Re: Cassandra hardware - balancing CPU/memory/iops/disk space
Date Tue, 09 Mar 2010 18:08:55 GMT
our dataset is too big to fit into cache, so we are hitting disk.  not a 
problem for normal operation, but when a node is restored, hinted 
handoff, load balanced, or if reads/write simply build up we see a 
problem.  the nodes can't seem to catch up.  this seems to be centered 
around drive seek time, not cassandra per se.

to combat we are doing the following:

- add more smaller drives per machine in RAID 0 to combat drive seek time.
- scale horizontally - add more machines to cluster to spread the load
- we also plan to try out SSDs as well.


Jonathan Ellis wrote:
> Yes, but I would guess 90% of workloads are better served with
> spending the extra money on more machines w/ cheap sata disks and lots
> of ram.
>
> -Jonathan
>
> On Sun, Mar 7, 2010 at 1:00 PM, Boris Shulman <shulmanb@gmail.com> wrote:
>   
>> Do you think having SAS disks will give better performance?
>>
>> On Sat, Mar 6, 2010 at 5:47 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
>>     
>>> I think http://wiki.apache.org/cassandra/CassandraHardware answers
>>> most of your questions.
>>>
>>> If possible, it's definitely useful to try out a small fraction of
>>> your anticipated workload against a test cluster, even a single node,
>>> before finalizing your production hardware purchase.
>>>
>>> On Sat, Mar 6, 2010 at 1:12 AM, Rosenberry, Eric
>>> <eric.rosenberry@iovation.com> wrote:
>>>       
>>>> I am looking for advice from others that are further along in deploying
>>>> Cassandra in production environments than we are.  I want to know what you
>>>> are finding your bottlenecks to be.  I would feel silly purchasing dual
>>>> processor quad core 2.93ghz Nehalem machines with 192 gigs of RAM just to
>>>> find out that the two local SATA disks kept all that CPU and RAM from being
>>>> useful (clearly that example would be a dumb).
>>>>
>>>>
>>>>
>>>> I need to spec out hardware for an “optimal” Cassandra node (though our
>>>> read/write characteristics are not yet fully defined so let’s go with an
>>>> “average” configuration).
>>>>
>>>>
>>>>
>>>> My main concern is finding the right balance of:
>>>>
>>>> ·         Available CPU
>>>>
>>>> ·         RAM amount
>>>>
>>>> ·         RAM speed (think Nehalem architecture where memory comes in a
few
>>>> speeds, though I doubt this is much of a concern as it is mainly dictated
by
>>>> which processor you buy and how many slots you populate)
>>>>
>>>> ·         Total iops available (i.e. number of disks)
>>>>
>>>> ·         Total disk space available (depending on the ratio of iops/space
>>>> deciding on SAS vs. SATA and various rotational speeds)
>>>>
>>>>
>>>>
>>>> My current thinking is 1U boxes with four 3.5 inch disks since that seems
to
>>>> be a readily available config.  One big question is should I go with a
>>>> single processor Nehalem system to go with those four disks, or would two
>>>> CPU’s be useful, and also, how much RAM is appropriate to match?  I am
>>>> making the assumption that Cassandra nodes are going to be disk bound as
>>>> they must do a random read to answer any given query (i.e. indexes in RAM,
>>>> but all data lives on disk?).
>>>>
>>>>
>>>>
>>>> The other big decision is what type of hard disks others are finding to
>>>> provide the optimal ratio of iops to available space?  SAS or SATA?  And
>>>> what rotational speed?
>>>>
>>>>
>>>>
>>>> Let me throw out here an actual hardware config and feel free to tell me
the
>>>> error of my ways:
>>>>
>>>> ·         A SuperMicro SuperServer 6016T-NTRF configured as follows:
>>>>
>>>> o   2.26 ghz E5520 dual processor quad core hyperthreaded Nehalem
>>>> architecture (this proc provides a lot of bang for the buck, faster procs
>>>> get more expensive quickly)
>>>>
>>>> o   Qty 12, 4 gig 1066mhz DIMMS for a total of 48 gigs RAM (the 4 gig DIMMS
>>>> seem to be the price sweet spot)
>>>>
>>>> o   Dual on board 1 gigabit NIC’s (perhaps one for client connections and
>>>> the other for cluster communication?)
>>>>
>>>> o   Dual power supplies (I don’t want to lose half my cluster due to a
>>>> failure on one power leg)
>>>>
>>>> o   4x 1TB SATA disks (this is a complete SWAG)
>>>>
>>>> o   No RAID controller (all just single individual disks presented to the
>>>> OS) – Though is there any down side to using a RAID controller with RAID
0
>>>> (perhaps one single disk for the log for sequential io’s, and 3x disks
in a
>>>> stripe for the random io’s)
>>>>
>>>> o   The on-board IPMI based OOB controller (so we can kick the boxes
>>>> remotely if need be)
>>>>
>>>> ·
>>>> http://www.supermicro.com/products/system/1U/6016/SYS-6016T-NTRF.cfm
>>>>
>>>>
>>>>
>>>> I can’t help but think the above config has way too much RAM and CPU and
not
>>>> enough iops capacity.  My understanding is that Cassandra does not cache
>>>> much in RAM though?
>>>>
>>>>
>>>>
>>>> Any thoughts are appreciated.  Thanks.
>>>>
>>>>
>>>>
>>>> -Eric
>>>>
>>>> _______________________________________________________________
>>>> Eric Rosenberry
>>>> Sr. Infrastructure Architect | Chief Bit Plumber
>>>>
>>>>
>>>>
>>>>
>>>> iovation
>>>> 111 SW Fifth Avenue
>>>> Suite 3200
>>>> Portland, OR 97204
>>>> www.iovation.com
>>>>
>>>> The information contained in this email message may be privileged,
>>>> confidential and protected from disclosure. If you are not the intended
>>>> recipient, any dissemination, distribution or copying is strictly
>>>> prohibited. If you think that you have received this email message in error,
>>>> please notify the sender by reply email and delete the message and any
>>>> attachments.
>>>>         

Mime
View raw message