Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of eprosenx@gmail.com designates
 209.85.213.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type;
        b=Q6j+uKFEuZ4kRn35dGV1xr2VmVL0UBs8CB7CIh99YE8gaBC4X6iTy0PXbrzcdIcnYs
         Ngsi8nHK/Pfk+CyuBCluFay72fZbUGXuqibly0cv22INvjHObVJFFA0WZUiSDLwNsAdf
         Z4rdgNJsutifJu2hNIriS7Clfo7S+tjsMjK4Q=
MIME-Version: 1.0
Sender: eprosenx@gmail.com
In-Reply-To: <002401cb7417$cd406390$67c12ab0$@com>
References: <AANLkTikxEGh1r6nuErBVpnROxieOBMsfhgYDYhWOT2-U@mail.gmail.com>
	<000901cb7406$8ee1ce60$aca56b20$@com>
	<AANLkTim+S4MXSRY1hM9W-Qy+2kco3N9RTrz4ApRH0XCK@mail.gmail.com>
	<002401cb7417$cd406390$67c12ab0$@com>
Date: Mon, 25 Oct 2010 13:10:29 -0700
Message-ID: <AANLkTimhzXj3LqvWpLUxKxTeVXi6M9Y+0HhxS9OZZt4L@mail.gmail.com>
Subject: Re: Experiences with Cassandra hardware planning
From: Eric Rosenberry <eric@rosenberry.org>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=00032555bac66b8495049376955a

--00032555bac66b8495049376955a
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I am going to respond to multiple questions in one email to keep down the
thread insanity:

On Mon, Oct 25, 2010 at 12:39 AM, David Dabbs <dmdabbs@gmail.com> wrote:

>  Sorry, Eric I=92m not following you. You=92ve set the JVM=92s processor
> affinity so it only runs on one of the processors?
>

My understanding is that Linux will launch a given process on one "node"
(processor in this case) or another and then attempt to allocate memory onl=
y
from that node for that process.  If free memory is unavailable on that nod=
e
it will assign memory from the other node.  The process scheduler will try
and schedule the process on that node as well.

My knowledge is very limited here, and in fact, most of what I know comes
from this article:
http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-a=
rchitecture/

 On Mon, Oct 25, 2010 at 8:25 AM, Edward Capriolo <edlinuxguru@gmail.com>
 wrote:

> If reading properly it looks like you used Linux Software Raid on top
> of the SSD devices. Can you talk about this? I would think that even
> with a simple RAID this would drive you CPU high. But it seems you may
> not have other options since SSD RAID cards probably do not exist.
>

Yes, we are running Linux kernel raid (not LVM).  This is mostly because ou=
r
first batch of machines had the SSD's hooked directly to the onboard Intel
ICH10 SATA controller rather than any add in RAID card.  We are only doing
RAID 0 here so I would not expect this to take any CPU to speak of since
it's just doing a mod operator (or something simple) to figure out which
disk the data goes on.  With RAID 0 there is no parity calculation.  Even i=
f
there was more work to be done, there are 8 cores (and 16 virtual processor=
s
when you consider hyperthreading) for that operation to be scheduled on.  W=
e
don't seem to be CPU bound.

That being said, we really should try out the LSI 2008's RAID 0 capability,
but we have not had a chance yet.

On Mon, Oct 25, 2010 at 9:07 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> On Mon, Oct 25, 2010 at 10:25 AM, Edward Capriolo <edlinuxguru@gmail.com>
> wrote:
> >> 2. We gave up on using Cassandra's row cache as loading any reasonable
> >> amount of data into the cache would take days/weeks with our tiny row
> size.
> >>  We instead are using file system cache.
>
> I don't follow the reasoning there.  Row cache or fs cache, it will be
> hot after reading it once, the difference is that doing a read to the
> cached data is much faster from row cache.


Yeah, I would have thought the same.  Benjamin Black actually recommended w=
e
go this route as with our dataset (we have huge numbers of super-tiny rows)
it would take weeks of running for the row cache to become useful.

-Eric

--00032555bac66b8495049376955a
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I am going to respond to multiple questions in one email to keep down the t=
hread insanity:<br><br><div class=3D"gmail_quote">On Mon, Oct 25, 2010 at 1=
2:39 AM, David Dabbs <span dir=3D"ltr">&lt;<a href=3D"mailto:dmdabbs@gmail.=
com">dmdabbs@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;">


<div lang=3D"EN-US" link=3D"blue" vlink=3D"purple">

<div>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;color:#1F497D">Sorry=
, Eric I=92m not following you. You=92ve set the JVM=92s
processor affinity so it only runs on one of the processors?</span></p></di=
v></div></blockquote><div><br></div><div>My understanding is that Linux wil=
l launch a given process on one &quot;node&quot; (processor in this case) o=
r another and then attempt to allocate memory only from that node for that =
process. =A0If free memory is unavailable on that node it will assign memor=
y from the other node. =A0The process scheduler will try and schedule the p=
rocess on that node as well.</div>
<div><br></div><div>My knowledge is very limited here, and in fact, most of=
 what I know comes from this article:</div><div><a href=3D"http://jcole.us/=
blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/">ht=
tp://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-arc=
hitecture/</a></div>
<div><br></div><div>=A0On Mon, Oct 25, 2010 at 8:25 AM, Edward Capriolo=A0<=
span dir=3D"ltr">&lt;<a href=3D"mailto:edlinuxguru@gmail.com">edlinuxguru@g=
mail.com</a>&gt;</span>=A0wrote:</div><meta charset=3D"utf-8"><blockquote c=
lass=3D"gmail_quote" style=3D"margin-top: 0px; margin-right: 0px; margin-bo=
ttom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: r=
gb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
If reading properly it looks like you used Linux Software Raid on top<br>of=
 the SSD devices. Can you talk about this? I would think that even<br>with =
a simple RAID this would drive you CPU high. But it seems you may<br>not ha=
ve other options since SSD RAID cards probably do not exist.<br>
</blockquote><div><br></div><div>Yes, we are running Linux kernel raid (not=
 LVM). =A0This is mostly because our first batch of machines had the SSD=
9;s hooked directly to the onboard Intel ICH10 SATA controller rather than =
any add in RAID card. =A0We are only doing RAID 0 here so I would not expec=
t this to take any CPU to speak of since it&#39;s just doing a mod operator=
 (or something simple) to figure out which disk the data goes on. =A0With R=
AID 0 there is no parity calculation. =A0Even if there was more work to be =
done, there are 8 cores (and 16 virtual processors when you consider hypert=
hreading) for that operation to be scheduled on. =A0We don&#39;t seem to be=
 CPU bound.</div>
<div><br></div><div>That being said, we really should try out the LSI 2008&=
#39;s RAID 0 capability, but we have not had a chance yet.</div><div><br></=
div><div><meta charset=3D"utf-8">On Mon, Oct 25, 2010 at 9:07 AM, Jonathan =
Ellis=A0<span dir=3D"ltr">&lt;<a href=3D"mailto:jbellis@gmail.com">jbellis@=
gmail.com</a>&gt;</span>=A0wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin-top: 0px; margin-right: 0=
px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-=
left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex=
; ">
<div class=3D"im">On Mon, Oct 25, 2010 at 10:25 AM, Edward Capriolo &lt;<a =
href=3D"mailto:edlinuxguru@gmail.com">edlinuxguru@gmail.com</a>&gt; wrote:<=
br>&gt;&gt; 2. We gave up on using Cassandra&#39;s row cache as loading any=
 reasonable<br>
&gt;&gt; amount of data into the cache would take days/weeks with our tiny =
row size.<br>&gt;&gt; =A0We instead are using file system cache.<br><br></d=
iv>I don&#39;t follow the reasoning there. =A0Row cache or fs cache, it wil=
l be<br>
hot after reading it once, the difference is that doing a read to the<br>ca=
ched data is much faster from row cache.</blockquote></div><div><br></div><=
div>Yeah, I would have thought the same. =A0Benjamin Black actually recomme=
nded we go this route as with our dataset (we have huge numbers of super-ti=
ny rows) it would take weeks of running for the row cache to become useful.=
</div>
<div><br></div><div>-Eric</div></div>

--00032555bac66b8495049376955a--