Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of tyler@datastax.com designates
 209.85.215.47 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CANN9sNpqj2nudmMak5vLZ4=0eXjHX5mC0s6f83P+xZiJXNO3=A@mail.gmail.com>
References: 
 <CANN9sNpqj2nudmMak5vLZ4=0eXjHX5mC0s6f83P+xZiJXNO3=A@mail.gmail.com>
Date: Wed, 3 Apr 2013 11:30:44 -0500
Message-ID: 
 <CAAam9sske=38tD4h0uaucWz0WxW5V4ejHQH8BO+gubpzRr-FXQ@mail.gmail.com>
Subject: Re: Linear scalability problems
From: Tyler Hobbs <tyler@datastax.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=e89a8f83abd724ef3c04d977609d

--e89a8f83abd724ef3c04d977609d
Content-Type: text/plain; charset=ISO-8859-1

If I had to guess, I would say that your client is the bottleneck, not the
cluster.  Are you inserting data with multiple threads or processes?


On Wed, Apr 3, 2013 at 8:49 AM, Anand Somani <meatforums@gmail.com> wrote:

> Hi,
>
> I am running some tests trying to scale out our application from using a 3
> node cluster to 6 node cluster. The thing I observed is that when using a 3
> node cluster I was able to handle abt 41 req/second, so I added 3 more
> nodes thinking it should close to double, but instead it only goes upto bat
> 47 req/second!! I am doing something wrong and it is not obvious, so wanted
> some help in what stats could/should I monitor to tell me things like if a
> node has more requests or if the load distribution is not random enough?
>
> Note I am using direct thrift (old code base) and cassandra 1.1.6. The
> data model is for storing blobs (split across columns) and has around 6 CF,
> RF=3 and all operations are at quorum. Also at the end of the run nodetool
> ring reports the same data size.
>
> Thanks
> Anand
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

--e89a8f83abd724ef3c04d977609d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">If I had to guess, I would say that your client is the bot=
tleneck, not the cluster.=A0 Are you inserting data with multiple threads o=
r processes?<br></div><div class=3D"gmail_extra"><br><br><div class=3D"gmai=
l_quote">
On Wed, Apr 3, 2013 at 8:49 AM, Anand Somani <span dir=3D"ltr">&lt;<a href=
=3D"mailto:meatforums@gmail.com" target=3D"_blank">meatforums@gmail.com</a>=
&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0=
 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi,<div><br></div><div>I am running some tests trying to scale out our appl=
ication from using a 3 node cluster to 6 node cluster. The thing I observed=
 is that when using a 3 node cluster I was able to handle abt 41 req/second=
, so I added 3 more nodes thinking it should close to double, but instead i=
t only goes upto bat 47 req/second!! I am doing something wrong and it is n=
ot obvious, so wanted some help in what stats could/should I monitor to tel=
l me things like if a node has more requests or if the load distribution is=
 not random enough?</div>

<div><br></div><div>Note I am using direct thrift (old code base) and cassa=
ndra 1.1.6. The data model is for storing blobs (split across columns) and =
has around 6 CF, RF=3D3 and all operations are at quorum. Also at the end o=
f the run nodetool ring reports the same data size.</div>

<div><br></div><div>Thanks</div><span class=3D"HOEnZb"><font color=3D"#8888=
88"><div>Anand</div>
</font></span></blockquote></div><br><br clear=3D"all"><br>-- <br><font col=
or=3D"#888888">Tyler Hobbs<span></span><br>
<a href=3D"http://datastax.com/" target=3D"_blank">DataStax</a><br></font>
</div>

--e89a8f83abd724ef3c04d977609d--