Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of edlinuxguru@gmail.com
 designates 209.85.223.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CCDA4026.60C7%mkjellman@barracuda.com>
References: <1354038762579-7583996.post@n2.nabble.com>
	<CCDA4026.60C7%mkjellman@barracuda.com>
Date: Tue, 27 Nov 2012 16:44:37 -0500
Message-ID: 
 <CAENxBwyjepNm=t7=00SzzzoC=OgBdxUPE_PwTi5jw2DU8tcCxw@mail.gmail.com>
Subject: Re: counters + replication = awful performance?
From: Edward Capriolo <edlinuxguru@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=14dae9341205dcfb7004cf80f499

--14dae9341205dcfb7004cf80f499
Content-Type: text/plain; charset=ISO-8859-1

The difference between Replication factor =1 and replication factor > 1 is
significant. Also it sounds like your cluster is 2 node so going from RF=1
to RF=2 means double the load on both nodes.

You may want to experiment with the very dangerous column family attribute:

- replicate_on_write: Replicate every counter update from the leader to the
follower replicas. Accepts the values true and false.

Edward
On Tue, Nov 27, 2012 at 1:02 PM, Michael Kjellman
<mkjellman@barracuda.com>wrote:

> Are you writing with QUORUM consistency or ONE?
>
> On 11/27/12 9:52 AM, "Sergey Olefir" <solf.lists@gmail.com> wrote:
>
> >Hi Juan,
> >
> >thanks for your input!
> >
> >In my case, however, I doubt this is the case -- clients are able to push
> >many more updates than I need to saturate replication_factor=2 case (e.g.
> >I'm doing as many as 6x more increments when testing 2-node cluster with
> >replication_factor=1), so bandwidth between clients and server should be
> >sufficient.
> >
> >Bandwidth between nodes in the cluster should also be quite sufficient
> >since
> >they are both in the same DC. But it is something to check, thanks!
> >
> >Best regards,
> >Sergey
> >
> >
> >Juan Valencia wrote
> >> Hi Sergey,
> >>
> >> I know I've had similar issues with counters which were bottle-necked by
> >> network throughput.  You might be seeing a problem with throughput
> >>between
> >> the clients and Cass or between the two Cass nodes.  It might not be
> >>your
> >> case, but that was what happened to me :-)
> >>
> >> Juan
> >>
> >>
> >> On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir &lt;
> >
> >> solf.lists@
> >
> >> &gt; wrote:
> >>
> >>> Hi,
> >>>
> >>> I have a serious problem with counters performance and I can't seem to
> >>> figure it out.
> >>>
> >>> Basically I'm building a system for accumulating some statistics "on
> >>>the
> >>> fly" via Cassandra distributed counters. For this I need counter
> >>>updates
> >>> to
> >>> work "really fast" and herein lies my problem -- as soon as I enable
> >>> replication_factor = 2, the performance goes down the drain. This
> >>>happens
> >>> in
> >>> my tests using both 1.0.x and 1.1.6.
> >>>
> >>> Let me elaborate:
> >>>
> >>> I have two boxes (virtual servers on top of physical servers rented
> >>> specifically for this purpose, i.e. it's not a cloud, nor it is shared;
> >>> virtual servers are managed by our admins as a way to limit damage as I
> >>> suppose :)). Cassandra partitioner is set to ByteOrderedPartitioner
> >>> because
> >>> I want to be able to do some range queries.
> >>>
> >>> First, I set up Cassandra individually on each box (not in a cluster)
> >>>and
> >>> test counter increments performance (exclusively increments, no reads).
> >>> For
> >>> tests I use code that is intended to somewhat resemble the expected
> >>>load
> >>> pattern -- particularly the majority of increments create new counters
> >>> with
> >>> some updating (adding) to already existing counters. In this test each
> >>> single node exhibits respectable performance - something on the order
> >>>of
> >>> 70k
> >>> (seventy thousand) increments per second.
> >>>
> >>> I then join both of these nodes into single cluster (using SimpleSnitch
> >>> and
> >>> SimpleStrategy, nothing fancy yet). I then run the same test using
> >>> replication_factor=1. The performance is on the order of 120k
> >>>increments
> >>> per
> >>> second -- which seems to be a reasonable increase over the single node
> >>> performance.
> >>>
> >>>
> >>> HOWEVER I then rerun the same test on the two-node cluster using
> >>> replication_factor=2 -- which is the least I'll need for actual
> >>> production
> >>> for redundancy purposes. And the performance I get is absolutely
> >>>horrible
> >>> --
> >>> much, MUCH worse than even single-node performance -- something on the
> >>> order
> >>> of less than 25k increments per second. In addition to clients not
> >>>being
> >>> able to push updates fast enough, I also see a lot of 'messages
> >>>dropped'
> >>> messages in the Cassandra log under this load.
> >>>
> >>> Could anyone advise what could be causing such drastic performance drop
> >>> under replication_factor=2? I was expecting something on the order of
> >>> single-node performance, not approximately 3x less.
> >>>
> >>>
> >>> When testing replication_factor=2 on 1.1.6 I can see that CPU usage
> >>>goes
> >>> through the roof. On 1.0.x I think it looked more like disk overload,
> >>>but
> >>> I'm not sure (being on virtual server I apparently can't see true
> >>> iostats).
> >>>
> >>> I do have Cassandra data on a separate disk, commit log and cache are
> >>> currently on the same disk as the system. I experimented with commit
> >>>log
> >>> flush modes and even with disabling commit log at all -- but it doesn't
> >>> seem
> >>> to have noticeable impact on the performance when under
> >>> replication_factor=2.
> >>>
> >>>
> >>> Any suggestions and hints will be much appreciated :) And please let me
> >>> know
> >>> if I need to share additional information about the configuration I'm
> >>> running on.
> >>>
> >>> Best regards,
> >>> Sergey
> >>>
> >>>
> >>>
> >>> --
> >>> View this message in context:
> >>>
> >>>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counter
> >>>s-replication-awful-performance-tp7583993.html
> >>> Sent from the
> >
> >> cassandra-user@.apache
> >
> >>  mailing list archive at
> >>> Nabble.com.
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Learn More:  SQI (Social Quality Index) - A Universal Measure of Social
> >> Quality
> >
> >
> >
> >
> >
> >--
> >View this message in context:
> >
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/counters-
> >replication-awful-performance-tp7583993p7583996.html
> >Sent from the cassandra-user@incubator.apache.org mailing list archive at
> >Nabble.com.
>
>
> 'Like' us on Facebook for exclusive content and other resources on all
> Barracuda Networks solutions.
>
> Visit http://barracudanetworks.com/facebook
>
>
>
>
>

--14dae9341205dcfb7004cf80f499
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div>The difference between Replication factor =3D1 and replication factor =
&gt; 1 is significant. Also it sounds like your cluster is 2 node so going =
from RF=3D1 to RF=3D2 means double the load on both nodes.<br></div><div><b=
r></div>
<div>You may want to experiment with the very dangerous column family attri=
bute:</div><div><br></div><div>- replicate_on_write: Replicate every counte=
r update from the leader to the<br>follower replicas. Accepts the values tr=
ue and false.<br>
</div><div><br></div><div>Edward<br></div><div class=3D"gmail_quote">On Tue=
, Nov 27, 2012 at 1:02 PM, Michael Kjellman <span dir=3D"ltr">&lt;<a href=
=3D"mailto:mkjellman@barracuda.com" target=3D"_blank">mkjellman@barracuda.c=
om</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Are you writing with QUORUM consistency or O=
NE?<br>
<br>
On 11/27/12 9:52 AM, &quot;Sergey Olefir&quot; &lt;<a href=3D"mailto:solf.l=
ists@gmail.com">solf.lists@gmail.com</a>&gt; wrote:<br>
<br>
&gt;Hi Juan,<br>
&gt;<br>
&gt;thanks for your input!<br>
&gt;<br>
&gt;In my case, however, I doubt this is the case -- clients are able to pu=
sh<br>
&gt;many more updates than I need to saturate replication_factor=3D2 case (=
e.g.<br>
&gt;I&#39;m doing as many as 6x more increments when testing 2-node cluster=
 with<br>
&gt;replication_factor=3D1), so bandwidth between clients and server should=
 be<br>
&gt;sufficient.<br>
&gt;<br>
&gt;Bandwidth between nodes in the cluster should also be quite sufficient<=
br>
&gt;since<br>
&gt;they are both in the same DC. But it is something to check, thanks!<br>
&gt;<br>
&gt;Best regards,<br>
&gt;Sergey<br>
&gt;<br>
&gt;<br>
&gt;Juan Valencia wrote<br>
&gt;&gt; Hi Sergey,<br>
&gt;&gt;<br>
&gt;&gt; I know I&#39;ve had similar issues with counters which were bottle=
-necked by<br>
&gt;&gt; network throughput. =A0You might be seeing a problem with throughp=
ut<br>
&gt;&gt;between<br>
&gt;&gt; the clients and Cass or between the two Cass nodes. =A0It might no=
t be<br>
&gt;&gt;your<br>
&gt;&gt; case, but that was what happened to me :-)<br>
&gt;&gt;<br>
&gt;&gt; Juan<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; On Tue, Nov 27, 2012 at 8:48 AM, Sergey Olefir &amp;lt;<br>
&gt;<br>
&gt;&gt; solf.lists@<br>
&gt;<br>
&gt;&gt; &amp;gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt;&gt; Hi,<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I have a serious problem with counters performance and I can&#=
39;t seem to<br>
&gt;&gt;&gt; figure it out.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Basically I&#39;m building a system for accumulating some stat=
istics &quot;on<br>
&gt;&gt;&gt;the<br>
&gt;&gt;&gt; fly&quot; via Cassandra distributed counters. For this I need =
counter<br>
&gt;&gt;&gt;updates<br>
&gt;&gt;&gt; to<br>
&gt;&gt;&gt; work &quot;really fast&quot; and herein lies my problem -- as =
soon as I enable<br>
&gt;&gt;&gt; replication_factor =3D 2, the performance goes down the drain.=
 This<br>
&gt;&gt;&gt;happens<br>
&gt;&gt;&gt; in<br>
&gt;&gt;&gt; my tests using both 1.0.x and 1.1.6.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Let me elaborate:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I have two boxes (virtual servers on top of physical servers r=
ented<br>
&gt;&gt;&gt; specifically for this purpose, i.e. it&#39;s not a cloud, nor =
it is shared;<br>
&gt;&gt;&gt; virtual servers are managed by our admins as a way to limit da=
mage as I<br>
&gt;&gt;&gt; suppose :)). Cassandra partitioner is set to ByteOrderedPartit=
ioner<br>
&gt;&gt;&gt; because<br>
&gt;&gt;&gt; I want to be able to do some range queries.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; First, I set up Cassandra individually on each box (not in a c=
luster)<br>
&gt;&gt;&gt;and<br>
&gt;&gt;&gt; test counter increments performance (exclusively increments, n=
o reads).<br>
&gt;&gt;&gt; For<br>
&gt;&gt;&gt; tests I use code that is intended to somewhat resemble the exp=
ected<br>
&gt;&gt;&gt;load<br>
&gt;&gt;&gt; pattern -- particularly the majority of increments create new =
counters<br>
&gt;&gt;&gt; with<br>
&gt;&gt;&gt; some updating (adding) to already existing counters. In this t=
est each<br>
&gt;&gt;&gt; single node exhibits respectable performance - something on th=
e order<br>
&gt;&gt;&gt;of<br>
&gt;&gt;&gt; 70k<br>
&gt;&gt;&gt; (seventy thousand) increments per second.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I then join both of these nodes into single cluster (using Sim=
pleSnitch<br>
&gt;&gt;&gt; and<br>
&gt;&gt;&gt; SimpleStrategy, nothing fancy yet). I then run the same test u=
sing<br>
&gt;&gt;&gt; replication_factor=3D1. The performance is on the order of 120=
k<br>
&gt;&gt;&gt;increments<br>
&gt;&gt;&gt; per<br>
&gt;&gt;&gt; second -- which seems to be a reasonable increase over the sin=
gle node<br>
&gt;&gt;&gt; performance.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; HOWEVER I then rerun the same test on the two-node cluster usi=
ng<br>
&gt;&gt;&gt; replication_factor=3D2 -- which is the least I&#39;ll need for=
 actual<br>
&gt;&gt;&gt; production<br>
&gt;&gt;&gt; for redundancy purposes. And the performance I get is absolute=
ly<br>
&gt;&gt;&gt;horrible<br>
&gt;&gt;&gt; --<br>
&gt;&gt;&gt; much, MUCH worse than even single-node performance -- somethin=
g on the<br>
&gt;&gt;&gt; order<br>
&gt;&gt;&gt; of less than 25k increments per second. In addition to clients=
 not<br>
&gt;&gt;&gt;being<br>
&gt;&gt;&gt; able to push updates fast enough, I also see a lot of &#39;mes=
sages<br>
&gt;&gt;&gt;dropped&#39;<br>
&gt;&gt;&gt; messages in the Cassandra log under this load.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Could anyone advise what could be causing such drastic perform=
ance drop<br>
&gt;&gt;&gt; under replication_factor=3D2? I was expecting something on the=
 order of<br>
&gt;&gt;&gt; single-node performance, not approximately 3x less.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; When testing replication_factor=3D2 on 1.1.6 I can see that CP=
U usage<br>
&gt;&gt;&gt;goes<br>
&gt;&gt;&gt; through the roof. On 1.0.x I think it looked more like disk ov=
erload,<br>
&gt;&gt;&gt;but<br>
&gt;&gt;&gt; I&#39;m not sure (being on virtual server I apparently can&#39=
;t see true<br>
&gt;&gt;&gt; iostats).<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; I do have Cassandra data on a separate disk, commit log and ca=
che are<br>
&gt;&gt;&gt; currently on the same disk as the system. I experimented with =
commit<br>
&gt;&gt;&gt;log<br>
&gt;&gt;&gt; flush modes and even with disabling commit log at all -- but i=
t doesn&#39;t<br>
&gt;&gt;&gt; seem<br>
&gt;&gt;&gt; to have noticeable impact on the performance when under<br>
&gt;&gt;&gt; replication_factor=3D2.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Any suggestions and hints will be much appreciated :) And plea=
se let me<br>
&gt;&gt;&gt; know<br>
&gt;&gt;&gt; if I need to share additional information about the configurat=
ion I&#39;m<br>
&gt;&gt;&gt; running on.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Best regards,<br>
&gt;&gt;&gt; Sergey<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; --<br>
&gt;&gt;&gt; View this message in context:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<a href=3D"http://cassandra-user-incubator-apache-org.3065146.n=
2.nabble.com/counter" target=3D"_blank">http://cassandra-user-incubator-apa=
che-org.3065146.n2.nabble.com/counter</a><br>
&gt;&gt;&gt;s-replication-awful-performance-tp7583993.html<br>
&gt;&gt;&gt; Sent from the<br>
&gt;<br>
&gt;&gt; cassandra-user@.apache<br>
&gt;<br>
&gt;&gt; =A0mailing list archive at<br>
&gt;&gt;&gt; Nabble.com.<br>
&gt;&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt;<br>
&gt;&gt; Learn More: =A0SQI (Social Quality Index) - A Universal Measure of=
 Social<br>
&gt;&gt; Quality<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;--<br>
&gt;View this message in context:<br>
&gt;<a href=3D"http://cassandra-user-incubator-apache-org.3065146.n2.nabble=
.com/counters-" target=3D"_blank">http://cassandra-user-incubator-apache-or=
g.3065146.n2.nabble.com/counters-</a><br>
&gt;replication-awful-performance-tp7583993p7583996.html<br>
&gt;Sent from the <a href=3D"mailto:cassandra-user@incubator.apache.org">ca=
ssandra-user@incubator.apache.org</a> mailing list archive at<br>
&gt;Nabble.com.<br>
<br>
<br>
&#39;Like&#39; us on Facebook for exclusive content and other resources on =
all Barracuda Networks solutions.<br>
<br>
Visit <a href=3D"http://barracudanetworks.com/facebook" target=3D"_blank">h=
ttp://barracudanetworks.com/facebook</a><br>
<br>
<br>
<br>
<br>
</blockquote></div><br>

--14dae9341205dcfb7004cf80f499--