Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of daveviner@gmail.com
 designates 209.85.161.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:in-reply-to:references:date
         :x-google-sender-auth:message-id:subject:from:to:content-type;
        b=KQpsWYaH/2yFm3MROtq7m0KKJXqtrWoBEaoIP10+/XmdSZ09h5yb2t5CjXjVCRqWZU
         CeA0vQVyGjXhVcgAR7aKNBBo/vyWMqRCpfPRGOIPDHXBxf+jFXqXgasO+ibT0JAMkHeU
         oc677XwBBb/3Izu/TnBIUu1L+Ru6U7/vCl61w=
MIME-Version: 1.0
Sender: daveviner@gmail.com
In-Reply-To: <AANLkTimaWsJnj-drqrbBH4jGny6nsDRGjyv+-1aYj3gj@mail.gmail.com>
References: <4C794746.8040708@gmail.com>
	<AANLkTi=2PrnYA4VLX+qbdrBDQSRJAg2NFFMaBRCX7WPf@mail.gmail.com>
	<4C79550E.1080907@gmail.com>
	<20100828213456.GA23023@alumni.caltech.edu>
	<AANLkTimWXUFhd3YxuBVQU-QspEpmmC665CaE=-c-su-_@mail.gmail.com>
	<20100829180422.GB23023@alumni.caltech.edu>
	<AANLkTikHc2M+PTKbkn7zuT6na1Hb3-9Vz3Pg-GMfY9aL@mail.gmail.com>
	<20100829234818.GH23023@alumni.caltech.edu>
	<AANLkTimXNvVV1hGdxNipY2c-GxgubyKQ4_gp1UC7=t=3@mail.gmail.com>
	<AANLkTimaWsJnj-drqrbBH4jGny6nsDRGjyv+-1aYj3gj@mail.gmail.com>
Date: Mon, 30 Aug 2010 10:02:36 -0700
Message-ID: <AANLkTi=f9zdT_ZGuYwr2rEMHDZZ6jGESHoXNunR6hzpR@mail.gmail.com>
Subject: Re: Cassandra & HAProxy
From: Dave Viner <daveviner@pobox.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=001636c5a57971040b048f0d6edf

--001636c5a57971040b048f0d6edf
Content-Type: text/plain; charset=ISO-8859-1

Hi Edward,

By "down hard", I assume you mean that the machine is no longer responding
on the cassandra thrift port.  That makes sense (and in fact is what I'm
doing currently).  But, it seems like the real improvement is something that
would allow for a simple monitor that goes beyond the simple "machine not
reachable" issue and covers more common scenarios that temporarily impact
service time, but aren't so drastic as to cause machine outage.

Dave Viner


On Mon, Aug 30, 2010 at 9:52 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

> On Mon, Aug 30, 2010 at 12:40 PM, Dave Viner <daveviner@pobox.com> wrote:
> > FWIW - we've been using HAProxy in front of a cassandra cluster in
> > production and haven't run into any problems yet.  It sounds like our
> > cluster is tiny in comparison to Anthony M's cluster.  But I just wanted
> to
> > mentioned that others out there are doing the same.
> > One thing in this thread that I thought was interesting is Ben's initial
> > comment "the presence of the proxy precludes clients properly backing off
> > from nodes returning errors."  I think it would be very cool if someone
> > implemented a mechanism for haproxy to detect the error nodes and then
> > enable it to drop those nodes from the rotation.  I'd be happy to help
> with
> > this, as I know how it works with haproxy and standard web servers or
> other
> > tcp servers.  But, I'm not sure how to make it work with Cassandra,
> since,
> > as Ben points out, it can return valid tcp responses (that say
> > "error-condition") on the standard port.
> > Dave Viner
> >
> > On Sun, Aug 29, 2010 at 4:48 PM, Anthony Molinaro
> > <anthonym@alumni.caltech.edu> wrote:
> >>
> >> On Sun, Aug 29, 2010 at 12:20:10PM -0700, Benjamin Black wrote:
> >> > On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro
> >> > <anthonym@alumni.caltech.edu> wrote:
> >> > >
> >> > >
> >> > > I don't know it seems to tax our setup of 39 extra large ec2 nodes,
> >> > > its
> >> > > also closer to 24000 reqs/sec at peak since there are different
> tables
> >> > > (2 tables for each read and 2 for each write)
> >> > >
> >> >
> >> > Could you clarify what you mean here?  On the face of it, this
> >> > performance seems really poor given the number and size of nodes.
> >>
> >> As you say I would expect to achieve much better performance given the
> >> node
> >> size, but if you go back and look through some of the issues we've seen
> >> over time, you'll find we've been hit with nodes being too small, having
> >> too few nodes to deal with request volume, having OOMs, having bad
> >> sstables,
> >> having the ring appear different to different nodes, and several other
> >> problems.
> >>
> >> Many of i/o problems presented themselves as MessageDeserializer pool
> >> backups
> >> (although we stopped having these since Jonathan was by and suggested
> row
> >> cache of about 1Gb, thanks Riptano!).  We currently have mystery OOMs
> >> which are probably caused by GC storms during compactions (although
> >> usually
> >> the nodes restart and compact fine, so who knows).  I also regularly
> watch
> >> nodes go away for 30 seconds or so (logs show node goes dead, then comes
> >> back to life a few seconds later).
> >>
> >> I've sort of given up worrying about these, as we are in the process of
> >> moving this cluster to our own machines in a colo, so I figure I should
> >> wait until they are moved, and see how the new machines do before I
> worry
> >> more about performance.
> >>
> >> -Anthony
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Anthony Molinaro                           <anthonym@alumni.caltech.edu
> >
> >
> >
>
> Any proxy with a TCP health check should be able to determine if the
> Cassandra service is down hard. The problem for the tools that are not
> cassandra protocol aware are detecting slowness or other anomalies
> like TimedOut exceptions.
>
> If you are seeing GC storms during compactions you might have rows
> that are too big. When the compaction hits these memory spikes. I
> lowered the compaction priority (and added more nodes) which has
> helped compaction back off leaving some IO for requests.
>

--001636c5a57971040b048f0d6edf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Edward,<div><br></div><div>By &quot;down hard&quot;, I assume you mean t=
hat the machine is no longer responding on the cassandra thrift port. =A0Th=
at makes sense (and in fact is what I&#39;m doing currently). =A0But, it se=
ems like the real improvement is something that would allow for a simple mo=
nitor that goes beyond the simple &quot;machine not reachable&quot; issue a=
nd covers more common scenarios that temporarily impact service time, but a=
ren&#39;t so drastic as to cause machine outage.</div>
<div><br></div><div>Dave Viner</div><div><br><br><div class=3D"gmail_quote"=
>On Mon, Aug 30, 2010 at 9:52 AM, Edward Capriolo <span dir=3D"ltr">&lt;<a =
href=3D"mailto:edlinuxguru@gmail.com">edlinuxguru@gmail.com</a>&gt;</span> =
wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div><div></div><div class=3D"h5">On Mon, A=
ug 30, 2010 at 12:40 PM, Dave Viner &lt;<a href=3D"mailto:daveviner@pobox.c=
om">daveviner@pobox.com</a>&gt; wrote:<br>

&gt; FWIW - we&#39;ve been using HAProxy in front of a cassandra cluster in=
<br>
&gt; production and haven&#39;t run into any problems yet. =A0It sounds lik=
e our<br>
&gt; cluster is tiny in comparison to Anthony M&#39;s cluster. =A0But I jus=
t wanted to<br>
&gt; mentioned that others out there are doing the same.<br>
&gt; One thing in this thread that I thought was interesting is Ben&#39;s i=
nitial<br>
&gt; comment &quot;the=A0presence of the proxy precludes clients properly b=
acking off<br>
&gt; from=A0nodes returning errors.&quot; =A0I think it would be very cool =
if someone<br>
&gt; implemented a mechanism for haproxy to detect the error nodes and then=
<br>
&gt; enable it to drop those nodes from the rotation. =A0I&#39;d be happy t=
o help with<br>
&gt; this, as I know how it works with haproxy and standard web servers or =
other<br>
&gt; tcp servers. =A0But, I&#39;m not sure how to make it work with Cassand=
ra, since,<br>
&gt; as Ben points out, it can return valid tcp responses (that say<br>
&gt; &quot;error-condition&quot;) on the standard port.<br>
&gt; Dave Viner<br>
&gt;<br>
&gt; On Sun, Aug 29, 2010 at 4:48 PM, Anthony Molinaro<br>
&gt; &lt;<a href=3D"mailto:anthonym@alumni.caltech.edu">anthonym@alumni.cal=
tech.edu</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; On Sun, Aug 29, 2010 at 12:20:10PM -0700, Benjamin Black wrote:<br=
>
&gt;&gt; &gt; On Sun, Aug 29, 2010 at 11:04 AM, Anthony Molinaro<br>
&gt;&gt; &gt; &lt;<a href=3D"mailto:anthonym@alumni.caltech.edu">anthonym@a=
lumni.caltech.edu</a>&gt; wrote:<br>
&gt;&gt; &gt; &gt;<br>
&gt;&gt; &gt; &gt;<br>
&gt;&gt; &gt; &gt; I don&#39;t know it seems to tax our setup of 39 extra l=
arge ec2 nodes,<br>
&gt;&gt; &gt; &gt; its<br>
&gt;&gt; &gt; &gt; also closer to 24000 reqs/sec at peak since there are di=
fferent tables<br>
&gt;&gt; &gt; &gt; (2 tables for each read and 2 for each write)<br>
&gt;&gt; &gt; &gt;<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Could you clarify what you mean here? =A0On the face of it, t=
his<br>
&gt;&gt; &gt; performance seems really poor given the number and size of no=
des.<br>
&gt;&gt;<br>
&gt;&gt; As you say I would expect to achieve much better performance given=
 the<br>
&gt;&gt; node<br>
&gt;&gt; size, but if you go back and look through some of the issues we=
9;ve seen<br>
&gt;&gt; over time, you&#39;ll find we&#39;ve been hit with nodes being too=
 small, having<br>
&gt;&gt; too few nodes to deal with request volume, having OOMs, having bad=
<br>
&gt;&gt; sstables,<br>
&gt;&gt; having the ring appear different to different nodes, and several o=
ther<br>
&gt;&gt; problems.<br>
&gt;&gt;<br>
&gt;&gt; Many of i/o problems presented themselves as MessageDeserializer p=
ool<br>
&gt;&gt; backups<br>
&gt;&gt; (although we stopped having these since Jonathan was by and sugges=
ted row<br>
&gt;&gt; cache of about 1Gb, thanks Riptano!). =A0We currently have mystery=
 OOMs<br>
&gt;&gt; which are probably caused by GC storms during compactions (althoug=
h<br>
&gt;&gt; usually<br>
&gt;&gt; the nodes restart and compact fine, so who knows). =A0I also regul=
arly watch<br>
&gt;&gt; nodes go away for 30 seconds or so (logs show node goes dead, then=
 comes<br>
&gt;&gt; back to life a few seconds later).<br>
&gt;&gt;<br>
&gt;&gt; I&#39;ve sort of given up worrying about these, as we are in the p=
rocess of<br>
&gt;&gt; moving this cluster to our own machines in a colo, so I figure I s=
hould<br>
&gt;&gt; wait until they are moved, and see how the new machines do before =
I worry<br>
&gt;&gt; more about performance.<br>
&gt;&gt;<br>
&gt;&gt; -Anthony<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; ------------------------------------------------------------------=
------<br>
&gt;&gt; Anthony Molinaro =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 &lt;<a href=3D"mailto:anthonym@alumni.caltech.edu">anthonym@alumni.calt=
ech.edu</a>&gt;<br>
&gt;<br>
&gt;<br>
<br>
</div></div>Any proxy with a TCP health check should be able to determine i=
f the<br>
Cassandra service is down hard. The problem for the tools that are not<br>
cassandra protocol aware are detecting slowness or other anomalies<br>
like TimedOut exceptions.<br>
<br>
If you are seeing GC storms during compactions you might have rows<br>
that are too big. When the compaction hits these memory spikes. I<br>
lowered the compaction priority (and added more nodes) which has<br>
helped compaction back off leaving some IO for requests.<br>
</blockquote></div><br></div>

--001636c5a57971040b048f0d6edf--