Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 16139 invoked from network); 25 Oct 2010 20:10:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Oct 2010 20:10:59 -0000 Received: (qmail 39025 invoked by uid 500); 25 Oct 2010 20:10:57 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 39009 invoked by uid 500); 25 Oct 2010 20:10:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 39001 invoked by uid 99); 25 Oct 2010 20:10:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Oct 2010 20:10:57 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eprosenx@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-yx0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 25 Oct 2010 20:10:50 +0000 Received: by yxp4 with SMTP id 4so2656718yxp.31 for ; Mon, 25 Oct 2010 13:10:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=xu7oCmM42U4nMfj9uC2uEuMYgw2DLgyzzTAZpD7wFqw=; b=V9ajYU3vN2UmtsIXWmsPyVJlrfN+mPm0fdPn4uoyEPnFdSEr8gJ061cLk/5LIac6cP 9Znbz4ze31O1mGgoR9yjd5I98CfOXOPBEsqqQH85H+e4Zyd650/VT8eLjxSsPPK21QVD bQiCGtyLEgpgEFR+isJPX98mlx7idsd0MluP8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=Q6j+uKFEuZ4kRn35dGV1xr2VmVL0UBs8CB7CIh99YE8gaBC4X6iTy0PXbrzcdIcnYs Ngsi8nHK/Pfk+CyuBCluFay72fZbUGXuqibly0cv22INvjHObVJFFA0WZUiSDLwNsAdf Z4rdgNJsutifJu2hNIriS7Clfo7S+tjsMjK4Q= MIME-Version: 1.0 Received: by 10.204.42.4 with SMTP id q4mr5541452bke.47.1288037429117; Mon, 25 Oct 2010 13:10:29 -0700 (PDT) Sender: eprosenx@gmail.com Received: by 10.204.154.144 with HTTP; Mon, 25 Oct 2010 13:10:29 -0700 (PDT) In-Reply-To: <002401cb7417$cd406390$67c12ab0$@com> References: <000901cb7406$8ee1ce60$aca56b20$@com> <002401cb7417$cd406390$67c12ab0$@com> Date: Mon, 25 Oct 2010 13:10:29 -0700 X-Google-Sender-Auth: dqJjc8Op6M4me4qgVqNzG6n-irY Message-ID: Subject: Re: Experiences with Cassandra hardware planning From: Eric Rosenberry To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=00032555bac66b8495049376955a X-Virus-Checked: Checked by ClamAV on apache.org --00032555bac66b8495049376955a Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I am going to respond to multiple questions in one email to keep down the thread insanity: On Mon, Oct 25, 2010 at 12:39 AM, David Dabbs wrote: > Sorry, Eric I=92m not following you. You=92ve set the JVM=92s processor > affinity so it only runs on one of the processors? > My understanding is that Linux will launch a given process on one "node" (processor in this case) or another and then attempt to allocate memory onl= y from that node for that process. If free memory is unavailable on that nod= e it will assign memory from the other node. The process scheduler will try and schedule the process on that node as well. My knowledge is very limited here, and in fact, most of what I know comes from this article: http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-a= rchitecture/ On Mon, Oct 25, 2010 at 8:25 AM, Edward Capriolo wrote: > If reading properly it looks like you used Linux Software Raid on top > of the SSD devices. Can you talk about this? I would think that even > with a simple RAID this would drive you CPU high. But it seems you may > not have other options since SSD RAID cards probably do not exist. > Yes, we are running Linux kernel raid (not LVM). This is mostly because ou= r first batch of machines had the SSD's hooked directly to the onboard Intel ICH10 SATA controller rather than any add in RAID card. We are only doing RAID 0 here so I would not expect this to take any CPU to speak of since it's just doing a mod operator (or something simple) to figure out which disk the data goes on. With RAID 0 there is no parity calculation. Even i= f there was more work to be done, there are 8 cores (and 16 virtual processor= s when you consider hyperthreading) for that operation to be scheduled on. W= e don't seem to be CPU bound. That being said, we really should try out the LSI 2008's RAID 0 capability, but we have not had a chance yet. On Mon, Oct 25, 2010 at 9:07 AM, Jonathan Ellis wrote: > On Mon, Oct 25, 2010 at 10:25 AM, Edward Capriolo > wrote: > >> 2. We gave up on using Cassandra's row cache as loading any reasonable > >> amount of data into the cache would take days/weeks with our tiny row > size. > >> We instead are using file system cache. > > I don't follow the reasoning there. Row cache or fs cache, it will be > hot after reading it once, the difference is that doing a read to the > cached data is much faster from row cache. Yeah, I would have thought the same. Benjamin Black actually recommended w= e go this route as with our dataset (we have huge numbers of super-tiny rows) it would take weeks of running for the row cache to become useful. -Eric --00032555bac66b8495049376955a Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I am going to respond to multiple questions in one email to keep down the t= hread insanity:

On Mon, Oct 25, 2010 at 1= 2:39 AM, David Dabbs <dmdabbs@gmail.com> wrote:

Sorry= , Eric I=92m not following you. You=92ve set the JVM=92s processor affinity so it only runs on one of the processors?


My understanding is that Linux wil= l launch a given process on one "node" (processor in this case) o= r another and then attempt to allocate memory only from that node for that = process. =A0If free memory is unavailable on that node it will assign memor= y from the other node. =A0The process scheduler will try and schedule the p= rocess on that node as well.

My knowledge is very limited here, and in fact, most of= what I know comes from this article:

=A0On Mon, Oct 25, 2010 at 8:25 AM, Edward Capriolo=A0<= span dir=3D"ltr"><edlinuxguru@g= mail.com>=A0wrote:
If reading properly it looks like you used Linux Software Raid on top
of= the SSD devices. Can you talk about this? I would think that even
with = a simple RAID this would drive you CPU high. But it seems you may
not ha= ve other options since SSD RAID cards probably do not exist.

Yes, we are running Linux kernel raid (not= LVM). =A0This is mostly because our first batch of machines had the SSD= 9;s hooked directly to the onboard Intel ICH10 SATA controller rather than = any add in RAID card. =A0We are only doing RAID 0 here so I would not expec= t this to take any CPU to speak of since it's just doing a mod operator= (or something simple) to figure out which disk the data goes on. =A0With R= AID 0 there is no parity calculation. =A0Even if there was more work to be = done, there are 8 cores (and 16 virtual processors when you consider hypert= hreading) for that operation to be scheduled on. =A0We don't seem to be= CPU bound.

That being said, we really should try out the LSI 2008&= #39;s RAID 0 capability, but we have not had a chance yet.

On Mon, Oct 25, 2010 at 9:07 AM, Jonathan = Ellis=A0<jbellis@= gmail.com>=A0wrote:
On Mon, Oct 25, 2010 at 10:25 AM, Edward Capriolo <edlinuxguru@gmail.com> wrote:<= br>>> 2. We gave up on using Cassandra's row cache as loading any= reasonable
>> amount of data into the cache would take days/weeks with our tiny = row size.
>> =A0We instead are using file system cache.

I don't follow the reasoning there. =A0Row cache or fs cache, it wil= l be
hot after reading it once, the difference is that doing a read to the
ca= ched data is much faster from row cache.

<= div>Yeah, I would have thought the same. =A0Benjamin Black actually recomme= nded we go this route as with our dataset (we have huge numbers of super-ti= ny rows) it would take weeks of running for the row cache to become useful.=

-Eric
--00032555bac66b8495049376955a--