Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of sorin.julean@gmail.com
 designates 209.85.161.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CA+2nF5Yav5MkT6nhzesLnuHgV4vuJ25-aDn7onqvB8=yEQeQkQ@mail.gmail.com>
References: 
 <CAADnm_cHOP_YZNx5iAx+nuqb2NGE9u_6RnnBv-i1Fpux3Avw1A@mail.gmail.com>
	<CA+2nF5Yav5MkT6nhzesLnuHgV4vuJ25-aDn7onqvB8=yEQeQkQ@mail.gmail.com>
Date: Mon, 31 Oct 2011 00:34:20 +0200
Message-ID: 
 <CAJYbC8teHxrQX_SQaqAiRipvSd1gJN_7YpDNFNpDa0QbAJzrpw@mail.gmail.com>
Subject: Re: Cassandra cluster HW spec (commit log directory vs data file
 directory)
From: Sorin Julean <sorin.julean@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0015174029d02f4de204b08bb90e

--0015174029d02f4de204b08bb90e
Content-Type: text/plain; charset=ISO-8859-1

Hey Chris,

 Thanks for sharing all  the info.
 I have few questions:
 1. What are you doing with so much memory :) ? How much of it do you
allocate for heap ?
 2. What your network speed ? Do you use trunks ? Do you have a dedicated
VLAN for gossip/store traffic ?

Cheers,
Sorin


On Sun, Oct 30, 2011 at 5:00 AM, Chris Goffinet <cg@chrisgoffinet.com>wrote:

> RE: RAID0 Recommendation
>
> Cassandra supports multiple data file directories. Because we do
> compactions, it's just much easier to deal with (1) data file directory
> that is stripped across all disks as 1 volume (RAID0). There are other ways
> to accomplish this though. At Twitter we use software raid (RAID0 & RAID10).
>
> We own the physical hardware and have found that even with hardware raid,
> software raid in Linux actually faster. The reason being is:
>
> http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10
>
> We have found that using far-copies is much faster over near-copies. We
> set the i/o scheduler to noop at the moment. We might move back to CFQ with
> more tuning in the future.
>
> We use RAID10 for cases where we need better disk performance if we are
> hitting the disk often, sacrificing storage. We initially thought RAID0
> should be faster over RAID10 until we found out about the near vs far
> layouts.
>
> RE: Hardware
>
> This is going to depend on how well your automated infrastructure is, but
> we chose the path of finding the cheapest servers we could get from
> Dell/HP/etc. 8/12 cores, 72gb memory per node, 2TB/3TB, 2.5".
>
> We are in the process of making changes to our servers, I'll report back
> in when we have more details to share.
>
> I wouldn't recommend 75 CFs. It could work but just seems too complex.
>
> Another recommendation for clusters, always go big. You will be thankful
> in the future for this. Even if you can do this on 3-6 nodes, go much
> larger for future expansion. If you own your hardware and racks, I
> recommend making sure to size out the rack diversity and # of nodes per
> rack. Also take into account the replication factor when doing this. RF=3,
> should be min of 3 racks, and # of nodes per rack should be divisible by
> the replication factor. This has worked out pretty well for us. Our biggest
> problems today are adding 100s of nodes to existing clusters at once. I'm
> not sure how many other companies are having this problem, but it's
> certainly on our radar to improve, if you get to that point :)
>
>
> On Tue, Oct 25, 2011 at 5:23 AM, Alexandru Sicoe <adsicoe@gmail.com>wrote:
>
>> Hi everyone,
>>
>> I am currently in the process of writing a hardware proposal for a
>> Cassandra cluster for storing a lot of monitoring time series data. My
>> workload is write intensive and my data set is extremely varied in types of
>> variables and insertion rate for these variables (I will have to handle an
>> order of 2 million variables coming in, each at very different rates - the
>> majority of them will come at very low rates but there are many that will
>> come at higher rates constant rates and a few coming in with huge spikes in
>> rates). These variables correspond to all basic C++ types and arrays of
>> these types. The highest insertion rates are received for basic types, out
>> of which U32 variables seem to be the most prevalent (e.g. I recorded 2
>> million U32 vars were inserted in 8 mins of operation while 600.000 doubles
>> and 170.000 strings were inserted during the same time. Note this
>> measurement was only for a subset of the total data currently taken in).
>>
>> At the moment I am partitioning the data in Cassandra in 75 CFs (each CF
>> corresponds to a logical partitioning of the set of variables mentioned
>> before - but this partitioning is not related with the amount of data or
>> rates...it is somewhat random). These 75 CFs account for ~1 million of the
>> variables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each
>> node is a 4 real core with 4 GB RAM and split commit log directory and data
>> file directory between two RAID arrays with HDDs). I can handle the load in
>> this configuration but the average CPU usage of the Cassandra nodes is
>> slightly above 50%. As I will need to add 12 more CFs (corresponding to
>> another ~ 1 million variables) plus potentially other data later, it is
>> clear that I need better hardware (also for the retrieval part).
>>
>> I am looking at Dell servers (Power Edge etc)
>>
>> Questions:
>>
>> 1. Is anyone using Dell HW for their Cassandra clusters? How do they
>> behave? Anybody care to share their configurations or tips for buying, what
>> to avoid etc?
>>
>> 2. Obviously I am going to keep to the advice on the
>> http://wiki.apache.org/cassandra/CassandraHardware and split the
>> commmitlog and data on separate disks. I was going to use SSD for commitlog
>> but then did some more research and found out that it doesn't make sense to
>> use SSDs for sequential appends because it won't have a performance
>> advantage with respect to rotational media. So I am going to use rotational
>> disk for the commit log and an SSD for data. Does this make sense?
>>
>> 3. What's the best way to find out how big my commitlog disk and my data
>> disk has to be? The Cassandra hardware page says the Commitlog disk
>> shouldn't be big but still I need to choose a size!
>>
>> 4. I also noticed RAID 0 configuration is recommended for the data file
>> directory. Can anyone explain why?
>>
>> Sorry for the huge email.....
>>
>> Cheers,
>> Alex
>>
>
>

--0015174029d02f4de204b08bb90e
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hey Chris,<br><br>=A0Thanks for sharing all=A0 the info.<br>=A0I have few q=
uestions:<br>=A01. What are you doing with so much memory :) ? How much of =
it do you allocate for heap ?<br>=A02. What your network speed ? Do you use=
 trunks ? Do you have a dedicated VLAN for gossip/store traffic ?<br>
<br>Cheers,<br>Sorin<br><br><br><div class=3D"gmail_quote">On Sun, Oct 30, =
2011 at 5:00 AM, Chris Goffinet <span dir=3D"ltr">&lt;<a href=3D"mailto:cg@=
chrisgoffinet.com">cg@chrisgoffinet.com</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex;">
RE: RAID0 Recommendation<div><br></div><div>Cassandra supports multiple dat=
a file directories. Because we do compactions, it&#39;s just much easier to=
 deal with (1) data file directory that is stripped across all disks as 1 v=
olume (RAID0). There are other ways to accomplish this though. At Twitter w=
e use software raid (RAID0 &amp; RAID10).</div>

<div><br></div><div>We own the physical hardware and have found that even w=
ith hardware raid, software raid in Linux actually faster. The reason being=
 is:</div><div><br></div><div><a href=3D"http://en.wikipedia.org/wiki/Non-s=
tandard_RAID_levels#Linux_MD_RAID_10" target=3D"_blank">http://en.wikipedia=
.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10</a></div>

<div><br></div><div>We have found that using far-copies is much faster over=
 near-copies. We set the i/o scheduler to noop at the moment. We might move=
 back to CFQ with more tuning in the future.</div><div><br></div><div>
We use RAID10 for cases where we need better disk performance if we are hit=
ting the disk often, sacrificing storage. We initially thought RAID0 should=
 be faster over RAID10 until we found out about the near vs far layouts.</d=
iv>

<div><br></div><div>RE: Hardware</div><div><br></div><div>This is going to =
depend on how well your automated infrastructure is, but we chose the path =
of finding the cheapest servers we could get from Dell/HP/etc. 8/12 cores, =
72gb memory per node, 2TB/3TB, 2.5&quot;.</div>

<div><br></div><div>We are in the process of making changes to our servers,=
 I&#39;ll report back in when we have more details to share.</div><div><br>=
</div><div>I wouldn&#39;t recommend 75 CFs. It could work but just seems to=
o complex.</div>

<div><br></div><div>Another recommendation for clusters, always go big. You=
 will be thankful in the future for this. Even if you can do this on 3-6 no=
des, go much larger for future=A0expansion. If you own your hardware and ra=
cks, I recommend making sure to size out the rack diversity and # of nodes =
per rack. Also take into account the replication factor when doing this. RF=
=3D3, should be min of 3 racks, and # of nodes per rack should be=A0divisib=
le=A0by the replication factor. This has worked out pretty well for us. Our=
 biggest problems today are adding 100s of nodes to existing clusters at on=
ce. I&#39;m not sure how many other companies are having this problem, but =
it&#39;s certainly on our radar to improve, if you get to that point :)=A0<=
/div>

<div><br></div><div><br><div class=3D"gmail_quote">On Tue, Oct 25, 2011 at =
5:23 AM, Alexandru Sicoe <span dir=3D"ltr">&lt;<a href=3D"mailto:adsicoe@gm=
ail.com" target=3D"_blank">adsicoe@gmail.com</a>&gt;</span> wrote:<br><bloc=
kquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #cc=
c solid;padding-left:1ex">

Hi everyone,<br><br>I am currently in the process of writing a hardware pro=
posal for a Cassandra cluster for storing a lot of monitoring time series d=
ata. My workload is write intensive and my data set is extremely varied in =
types of variables and insertion rate for these variables (I will have to h=
andle an order of 2 million variables coming in, each at very different rat=
es - the majority of them will come at very low rates but there are many th=
at will come at higher rates constant rates and a few coming in with huge s=
pikes in rates). These variables correspond to all basic C++ types and arra=
ys of these types. The highest insertion rates are received for basic types=
, out of which U32 variables seem to be the most prevalent (e.g. I recorded=
 2 million U32 vars were inserted in 8 mins of operation while 600.000 doub=
les and 170.000 strings were inserted during the same time. Note this measu=
rement was only for a subset of the total data currently taken in).<br>


<br>At the moment I am partitioning the data in Cassandra in 75 CFs (each C=
F corresponds to a logical partitioning of the set of variables mentioned b=
efore - but this partitioning is not related with the amount of data or rat=
es...it is somewhat random). These 75 CFs account for ~1 million of the var=
iables I need to store. I have a 3 node Cassandra 0.8.5 cluster (each node =
is a 4 real core with 4 GB RAM and split commit log directory and data file=
 directory between two RAID arrays with HDDs). I can handle the load in thi=
s configuration but the average CPU usage of the Cassandra nodes is slightl=
y above 50%. As I will need to add 12 more CFs (corresponding to another ~ =
1 million variables) plus potentially other data later, it is clear that I =
need better hardware (also for the retrieval part).<br>


<br>I am looking at Dell servers (Power Edge etc)<br><br>Questions:<br><br>=
1. Is anyone using Dell HW for their Cassandra clusters? How do they behave=
? Anybody care to share their configurations or tips for buying, what to av=
oid etc?<br>


<br>2. Obviously I am going to keep to the advice on the <a href=3D"http://=
wiki.apache.org/cassandra/CassandraHardware" target=3D"_blank">http://wiki.=
apache.org/cassandra/CassandraHardware</a> and split the commmitlog and dat=
a on separate disks. I was going to use SSD for commitlog but then did some=
 more research and found out that it doesn&#39;t make sense to use SSDs for=
 sequential appends because it won&#39;t have a performance advantage with =
respect to rotational media. So I am going to use rotational disk for the c=
ommit log and an SSD for data. Does this make sense?<br>


<br>3. What&#39;s the best way to find out how big my commitlog disk and my=
 data disk has to be? The Cassandra hardware page says the Commitlog disk s=
houldn&#39;t be big but still I need to choose a size!<br><br>4. I also not=
iced RAID 0 configuration is recommended for the data file directory. Can a=
nyone explain why?<br>


<br>Sorry for the huge email.....<br><br>Cheers,<br>Alex<br>
</blockquote></div><br></div>
</blockquote></div><br>

--0015174029d02f4de204b08bb90e--