Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 017947B31 for ; Sun, 30 Oct 2011 22:34:49 +0000 (UTC) Received: (qmail 59463 invoked by uid 500); 30 Oct 2011 22:34:47 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 59342 invoked by uid 500); 30 Oct 2011 22:34:46 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 59328 invoked by uid 99); 30 Oct 2011 22:34:46 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Oct 2011 22:34:46 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sorin.julean@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 30 Oct 2011 22:34:42 +0000 Received: by faas12 with SMTP id s12so5919927faa.31 for ; Sun, 30 Oct 2011 15:34:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=cYfz3ccnkmxQiAYOwRX4hbYnpIAKfAR6cFskgzQSjSw=; b=vmmql6GIfDe/MvSbWn/sWnWodxic1VJP9oW1V3qmLntG3VifoRACuph+lQjksPJi10 UsYzfUEkweTp6SgrWG3sA7gFlieXh4dzzid4BRc0kOZUMAohflfg7Y2dJDrFJRvisiB6 VqjyoKwXMBvjPbhOwUDOTL0Gq1VrWt0jTeDTw= MIME-Version: 1.0 Received: by 10.223.61.211 with SMTP id u19mr24434570fah.29.1320014060669; Sun, 30 Oct 2011 15:34:20 -0700 (PDT) Received: by 10.223.118.68 with HTTP; Sun, 30 Oct 2011 15:34:20 -0700 (PDT) In-Reply-To: References: Date: Mon, 31 Oct 2011 00:34:20 +0200 Message-ID: Subject: Re: Cassandra cluster HW spec (commit log directory vs data file directory) From: Sorin Julean To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0015174029d02f4de204b08bb90e --0015174029d02f4de204b08bb90e Content-Type: text/plain; charset=ISO-8859-1 Hey Chris, Thanks for sharing all the info. I have few questions: 1. What are you doing with so much memory :) ? How much of it do you allocate for heap ? 2. What your network speed ? Do you use trunks ? Do you have a dedicated VLAN for gossip/store traffic ? Cheers, Sorin On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet wrote: > RE: RAID0 Recommendation > > Cassandra supports multiple data file directories. Because we do > compactions, it's just much easier to deal with (1) data file directory > that is stripped across all disks as 1 volume (RAID0). There are other ways > to accomplish this though. At Twitter we use software raid (RAID0 & RAID10). > > We own the physical hardware and have found that even with hardware raid, > software raid in Linux actually faster. The reason being is: > > http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 > > We have found that using far-copies is much faster over near-copies. We > set the i/o scheduler to noop at the moment. We might move back to CFQ with > more tuning in the future. > > We use RAID10 for cases where we need better disk performance if we are > hitting the disk often, sacrificing storage. We initially thought RAID0 > should be faster over RAID10 until we found out about the near vs far > layouts. > > RE: Hardware > > This is going to depend on how well your automated infrastructure is, but > we chose the path of finding the cheapest servers we could get from > Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5". > > We are in the process of making changes to our servers, I'll report back > in when we have more details to share. > > I wouldn't recommend 75 CFs. It could work but just seems too complex. > > Another recommendation for clusters, always go big. You will be thankful > in the future for this. Even if you can do this on 3-6 nodes, go much > larger for future expansion. If you own your hardware and racks, I > recommend making sure to size out the rack diversity and # of nodes per > rack. Also take into account the replication factor when doing this. RF=3, > should be min of 3 racks, and # of nodes per rack should be divisible by > the replication factor. This has worked out pretty well for us. Our biggest > problems today are adding 100s of nodes to existing clusters at once. I'm > not sure how many other companies are having this problem, but it's > certainly on our radar to improve, if you get to that point :) > > > On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe wrote: > >> Hi everyone, >> >> I am currently in the process of writing a hardware proposal for a >> Cassandra cluster for storing a lot of monitoring time series data. My >> workload is write intensive and my data set is extremely varied in types of >> variables and insertion rate for these variables (I will have to handle an >> order of 2 million variables coming in, each at very different rates - the >> majority of them will come at very low rates but there are many that will >> come at higher rates constant rates and a few coming in with huge spikes in >> rates). These variables correspond to all basic C++ types and arrays of >> these types. The highest insertion rates are received for basic types, out >> of which U32 variables seem to be the most prevalent (e.g. I recorded 2 >> million U32 vars were inserted in 8 mins of operation while 600.000 doubles >> and 170.000 strings were inserted during the same time. Note this >> measurement was only for a subset of the total data currently taken in). >> >> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF >> corresponds to a logical partitioning of the set of variables mentioned >> before - but this partitioning is not related with the amount of data or >> rates...it is somewhat random). These 75 CFs account for ~1 million of the >> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each >> node is a 4 real core with 4 GB RAM and split commit log directory and data >> file directory between two RAID arrays with HDDs). I can handle the load in >> this configuration but the average CPU usage of the Cassandra nodes is >> slightly above 50%. As I will need to add 12 more CFs (corresponding to >> another ~ 1 million variables) plus potentially other data later, it is >> clear that I need better hardware (also for the retrieval part). >> >> I am looking at Dell servers (Power Edge etc) >> >> Questions: >> >> 1. Is anyone using Dell HW for their Cassandra clusters? How do they >> behave? Anybody care to share their configurations or tips for buying, what >> to avoid etc? >> >> 2. Obviously I am going to keep to the advice on the >> http://wiki.apache.org/cassandra/CassandraHardware and split the >> commmitlog and data on separate disks. I was going to use SSD for commitlog >> but then did some more research and found out that it doesn't make sense to >> use SSDs for sequential appends because it won't have a performance >> advantage with respect to rotational media. So I am going to use rotational >> disk for the commit log and an SSD for data. Does this make sense? >> >> 3. What's the best way to find out how big my commitlog disk and my data >> disk has to be? The Cassandra hardware page says the Commitlog disk >> shouldn't be big but still I need to choose a size! >> >> 4. I also noticed RAID 0 configuration is recommended for the data file >> directory. Can anyone explain why? >> >> Sorry for the huge email..... >> >> Cheers, >> Alex >> > > --0015174029d02f4de204b08bb90e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hey Chris,

=A0Thanks for sharing all=A0 the info.
=A0I have few q= uestions:
=A01. What are you doing with so much memory :) ? How much of = it do you allocate for heap ?
=A02. What your network speed ? Do you use= trunks ? Do you have a dedicated VLAN for gossip/store traffic ?

Cheers,
Sorin


On Sun, Oct 30, = 2011 at 5:00 AM, Chris Goffinet <cg@chrisgoffinet.com> wrote:
RE: RAID0 Recommendation

Cassandra supports multiple dat= a file directories. Because we do compactions, it's just much easier to= deal with (1) data file directory that is stripped across all disks as 1 v= olume (RAID0). There are other ways to accomplish this though. At Twitter w= e use software raid (RAID0 & RAID10).

We own the physical hardware and have found that even w= ith hardware raid, software raid in Linux actually faster. The reason being= is:


We have found that using far-copies is much faster over= near-copies. We set the i/o scheduler to noop at the moment. We might move= back to CFQ with more tuning in the future.

We use RAID10 for cases where we need better disk performance if we are hit= ting the disk often, sacrificing storage. We initially thought RAID0 should= be faster over RAID10 until we found out about the near vs far layouts.

RE: Hardware

This is going to = depend on how well your automated infrastructure is, but we chose the path = of finding the cheapest servers we could get from Dell/HP/etc. 8/12 cores, = 72gb memory per node, 2TB/3TB, 2.5".

We are in the process of making changes to our servers,= I'll report back in when we have more details to share.

=
I wouldn't recommend 75 CFs. It could work but just seems to= o complex.

Another recommendation for clusters, always go big. You= will be thankful in the future for this. Even if you can do this on 3-6 no= des, go much larger for future=A0expansion. If you own your hardware and ra= cks, I recommend making sure to size out the rack diversity and # of nodes = per rack. Also take into account the replication factor when doing this. RF= =3D3, should be min of 3 racks, and # of nodes per rack should be=A0divisib= le=A0by the replication factor. This has worked out pretty well for us. Our= biggest problems today are adding 100s of nodes to existing clusters at on= ce. I'm not sure how many other companies are having this problem, but = it's certainly on our radar to improve, if you get to that point :)=A0<= /div>


On Tue, Oct 25, 2011 at = 5:23 AM, Alexandru Sicoe <adsicoe@gmail.com> wrote:
Hi everyone,

I am currently in the process of writing a hardware pro= posal for a Cassandra cluster for storing a lot of monitoring time series d= ata. My workload is write intensive and my data set is extremely varied in = types of variables and insertion rate for these variables (I will have to h= andle an order of 2 million variables coming in, each at very different rat= es - the majority of them will come at very low rates but there are many th= at will come at higher rates constant rates and a few coming in with huge s= pikes in rates). These variables correspond to all basic C++ types and arra= ys of these types. The highest insertion rates are received for basic types= , out of which U32 variables seem to be the most prevalent (e.g. I recorded= 2 million U32 vars were inserted in 8 mins of operation while 600.000 doub= les and 170.000 strings were inserted during the same time. Note this measu= rement was only for a subset of the total data currently taken in).

At the moment I am partitioning the data in Cassandra in 75 CFs (each C= F corresponds to a logical partitioning of the set of variables mentioned b= efore - but this partitioning is not related with the amount of data or rat= es...it is somewhat random). These 75 CFs account for ~1 million of the var= iables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each node = is a 4 real core with 4 GB RAM and split commit log directory and data file= directory between two RAID arrays with HDDs). I can handle the load in thi= s configuration but the average CPU usage of the Cassandra nodes is slightl= y above 50%. As I will need to add 12 more CFs (corresponding to another ~ = 1 million variables) plus potentially other data later, it is clear that I = need better hardware (also for the retrieval part).

I am looking at Dell servers (Power Edge etc)

Questions:

= 1. Is anyone using Dell HW for their Cassandra clusters? How do they behave= ? Anybody care to share their configurations or tips for buying, what to av= oid etc?

2. Obviously I am going to keep to the advice on the http://wiki.= apache.org/cassandra/CassandraHardware and split the commmitlog and dat= a on separate disks. I was going to use SSD for commitlog but then did some= more research and found out that it doesn't make sense to use SSDs for= sequential appends because it won't have a performance advantage with = respect to rotational media. So I am going to use rotational disk for the c= ommit log and an SSD for data. Does this make sense?

3. What's the best way to find out how big my commitlog disk and my= data disk has to be? The Cassandra hardware page says the Commitlog disk s= houldn't be big but still I need to choose a size!

4. I also not= iced RAID 0 configuration is recommended for the data file directory. Can a= nyone explain why?

Sorry for the huge email.....

Cheers,
Alex


--0015174029d02f4de204b08bb90e--