Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 39433 invoked from network); 30 Aug 2010 16:41:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Aug 2010 16:41:06 -0000 Received: (qmail 34516 invoked by uid 500); 30 Aug 2010 16:41:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 34433 invoked by uid 500); 30 Aug 2010 16:41:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 34425 invoked by uid 99); 30 Aug 2010 16:41:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 16:41:03 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of daveviner@gmail.com designates 209.85.161.44 as permitted sender) Received: from [209.85.161.44] (HELO mail-fx0-f44.google.com) (209.85.161.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 16:40:56 +0000 Received: by fxm18 with SMTP id 18so3873483fxm.31 for ; Mon, 30 Aug 2010 09:40:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=aBh5O1wjubGxdngSoMDi2AXqzXPNDNqdD4uKru9vkUs=; b=CwEzA4QiHWEkdBAXhuTU4j0WT6kbWBzUAo2WNffG5U4tyr6edibOW8QO0SU8eXN+5r kF1rlrWFzMewejIXmMLBNx1lTtvINLDw+KztBz6XRK8MlF4o3Z1sfo/mYMSS4fqbi5RX KlDwaOcENsQo1qXRs7WWS0zO//ihBF9y5kqug= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=UJwur4GQ7A3P4N/fWXX+PXhBwlmk/avRBXOQS21E3v8hfuNoXGwhFzU0bbIMhHCDjo c4CpxJS4gpz5tp27WD0SAgBJ8zgV84L1hJzL+ZdQPNlVCRs9s32ytj71QeKU+3F6VeLN 9mHjrXNzEixOFPRikHYs8TI6TWi90VgVQtao0= MIME-Version: 1.0 Received: by 10.223.126.84 with SMTP id b20mr4079666fas.98.1283186436388; Mon, 30 Aug 2010 09:40:36 -0700 (PDT) Sender: daveviner@gmail.com Received: by 10.223.18.216 with HTTP; Mon, 30 Aug 2010 09:40:36 -0700 (PDT) In-Reply-To: <20100829234818.GH23023@alumni.caltech.edu> References: <4C794746.8040708@gmail.com> <4C79550E.1080907@gmail.com> <20100828213456.GA23023@alumni.caltech.edu> <20100829180422.GB23023@alumni.caltech.edu> <20100829234818.GH23023@alumni.caltech.edu> Date: Mon, 30 Aug 2010 09:40:36 -0700 X-Google-Sender-Auth: Q66-3_4h1h7dq-kr4IvLjrdmZqg Message-ID: Subject: Re: Cassandra & HAProxy From: Dave Viner To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001636c5a8d5b8b8d0048f0d1fe2 X-Virus-Checked: Checked by ClamAV on apache.org --001636c5a8d5b8b8d0048f0d1fe2 Content-Type: text/plain; charset=ISO-8859-1 FWIW - we've been using HAProxy in front of a cassandra cluster in production and haven't run into any problems yet. It sounds like our cluster is tiny in comparison to Anthony M's cluster. But I just wanted to mentioned that others out there are doing the same. One thing in this thread that I thought was interesting is Ben's initial comment "the presence of the proxy precludes clients properly backing off from nodes returning errors." I think it would be very cool if someone implemented a mechanism for haproxy to detect the error nodes and then enable it to drop those nodes from the rotation. I'd be happy to help with this, as I know how it works with haproxy and standard web servers or other tcp servers. But, I'm not sure how to make it work with Cassandra, since, as Ben points out, it can return valid tcp responses (that say "error-condition") on the standard port. Dave Viner On Sun, Aug 29, 2010 at 4:48 PM, Anthony Molinaro < anthonym@alumni.caltech.edu> wrote: > > On Sun, Aug 29, 2010 at 12:20:10PM -0700, Benjamin Black wrote: > > On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro > > wrote: > > > > > > > > > I don't know it seems to tax our setup of 39 extra large ec2 nodes, its > > > also closer to 24000 reqs/sec at peak since there are different tables > > > (2 tables for each read and 2 for each write) > > > > > > > Could you clarify what you mean here? On the face of it, this > > performance seems really poor given the number and size of nodes. > > As you say I would expect to achieve much better performance given the node > size, but if you go back and look through some of the issues we've seen > over time, you'll find we've been hit with nodes being too small, having > too few nodes to deal with request volume, having OOMs, having bad > sstables, > having the ring appear different to different nodes, and several other > problems. > > Many of i/o problems presented themselves as MessageDeserializer pool > backups > (although we stopped having these since Jonathan was by and suggested row > cache of about 1Gb, thanks Riptano!). We currently have mystery OOMs > which are probably caused by GC storms during compactions (although usually > the nodes restart and compact fine, so who knows). I also regularly watch > nodes go away for 30 seconds or so (logs show node goes dead, then comes > back to life a few seconds later). > > I've sort of given up worrying about these, as we are in the process of > moving this cluster to our own machines in a colo, so I figure I should > wait until they are moved, and see how the new machines do before I worry > more about performance. > > -Anthony > > -- > ------------------------------------------------------------------------ > Anthony Molinaro > --001636c5a8d5b8b8d0048f0d1fe2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable FWIW - we've been using HAProxy in front of a cassandra cluster in prod= uction and haven't run into any problems yet. =A0It sounds like our clu= ster is tiny in comparison to Anthony M's cluster. =A0But I just wanted= to mentioned that others out there are doing the same.

One thing in this thread that I thought was interesting is B= en's initial comment "the=A0presence of the proxy precludes clients prop= erly backing off from=A0nodes returning errors." =A0I think it would b= e very cool if someone implemented a mechanism for haproxy to detect the er= ror nodes and then enable it to drop those nodes from the rotation. =A0I= 9;d be happy to help with this, as I know how it works with haproxy and sta= ndard web servers or other tcp servers. =A0But, I'm not sure how to mak= e it work with Cassandra, since, as Ben points out, it can return valid tcp= responses (that say "error-condition") on the standard port.
<= br>
Dave Viner
<= br>

On Sun, Aug 29, 2010 at= 4:48 PM, Anthony Molinaro <anthonym@alumni.caltech.edu> wrote:

On Sun, Aug 29, 2010 at 12:20:10PM -0700, Benjamin Black wrote:
> On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro
> <anthonym@alumni.cal= tech.edu> wrote:
> >
> >
> > I don't know it seems to tax our setup of 39 extra large ec2 = nodes, its
> > also closer to 24000 reqs/sec at peak since there are different t= ables
> > (2 tables for each read and 2 for each write)
> >
>
> Could you clarify what you mean here? =A0On the face of it, this
> performance seems really poor given the number and size of nodes.

As you say I would expect to achieve much better performance gi= ven the node
size, but if you go back and look through some of the issues we've seen=
over time, you'll find we've been hit with nodes being too small, h= aving
too few nodes to deal with request volume, having OOMs, having bad sstables= ,
having the ring appear different to different nodes, and several other
problems.

Many of i/o problems presented themselves as MessageDeserializer pool backu= ps
(although we stopped having these since Jonathan was by and suggested row cache of about 1Gb, thanks Riptano!). =A0We currently have mystery OOMs
which are probably caused by GC storms during compactions (although usually=
the nodes restart and compact fine, so who knows). =A0I also regularly watc= h
nodes go away for 30 seconds or so (logs show node goes dead, then comes back to life a few seconds later).

I've sort of given up worrying about these, as we are in the process of=
moving this cluster to our own machines in a colo, so I figure I should
wait until they are moved, and see how the new machines do before I worry more about performance.

-Anthony

--
------------------------------------------------------------------------
Anthony Molinaro =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 <anthonym@alumni.caltech.edu>

--001636c5a8d5b8b8d0048f0d1fe2--