Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of watcherfr@gmail.com designates
 74.125.82.42 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAHwsXYmZSiU2_e_JX8t_AqV1dpjvNg5Q7SpZrn2=2yXCC-VMdA@mail.gmail.com>
References: 
 <CAHwsXYnypCSLkGOq_oymVn5tDvpippsUSqZnj4Xz326XLA3umA@mail.gmail.com>
	<03AF5DDE-7AA1-474F-8491-65D1F4A1D0F4@thelastpickle.com>
	<CAHwsXYmZSiU2_e_JX8t_AqV1dpjvNg5Q7SpZrn2=2yXCC-VMdA@mail.gmail.com>
Date: Wed, 21 Dec 2011 23:15:08 +0100
Message-ID: 
 <CAHwsXYk=Cp87G3euNbKAH9JV3z1tAeaUxsQW3EBrYhTp1xoh7Q@mail.gmail.com>
Subject: Re: Counter read requests spread across replicas ?
From: Philippe <watcherfr@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=e89a8f23443b470f5104b4a18460

--e89a8f23443b470f5104b4a18460
Content-Type: text/plain; charset=ISO-8859-1

along the same line of the last experimient I did (cluster is only being
updated by a single threaded batching processing.)
All nodes are the same hardware & configuration. Why on earth would one
node require disk IO and not the 2 replicas ?

Primary replica show some disk activity (iostat shows about 40%)
----total-cpu-usage---- -dsk/total-
usr sys idl wai hiq siq| read  writ
67  10  19   2   0   3|4244k  364k|

where as 2nd & 3rd replica do not
----total-cpu-usage---- -dsk/total-
usr sys idl wai hiq siq| read  writ
42  13  41   0   0   3|   0     0 |
 47  15  34   0   0   4|4096B  185k
 49  14  35   0   0   3|   0  8192B
 47  16  33   0   0   4|   0  4096B
 44  13  41   0   0   3| 284k  112k

3rd
11   2  87   1   0   0|   0   136k|
  0   0  99   0   0   0|   0     0
  9   1  90   0   0   0|4096B  128k
  2   2  96   0   0   0|   0     0
  0   0  99   0   0   0|   0     0
 11   1  87   0   0   0|   0   128k


Philippe
2011/12/21 Philippe <watcherfr@gmail.com>

> Hi Aaron,
>
> >How many rows are you asking for in the multget_slice and what thread
> pools are showing pending tasks ?
> I am querying in batches of 256 keys max. Each batch may slice between 1
> and 5 explicit super columns (I need all the columns in each super column,
> there are at the very most a couple dozen columns per SC).
>
> On the first replica, only ReadStage ever shows any pending. All the
> others  have 1 to 10 pending from time to time only. Here's a typical "high
> pending count" reading on the first replica for the data hotspot.
> ReadStage                        13      5238    10374301128         0
>             0
> I've got a watch running every two seconds and I see the numbers vary
> every time going from that high point to 0 active, 0 pending. The one thing
> I've noticed is that I hardly every see the Active count stay up at the
> current 2s sampling rate.
> On the 2 other replicas, I hardly ever see any pendings on ReadStage and
> Active hardly goes up to 1 or 2. But I do see a little PENDING
> on RequestResponseStage, goes up in the tens or hundreds from time to time.
>
>
> If I'm flooding that one replica, shouldn't the ReadStage Active count be
> at maximum capacity ?
>
>
> I've already thought of CASSANDRA-2980 but I'm running 0.8.7 and 0.8.9.
>
> Also, what happens when you reduce the number of rows in the request?
>>
> I've reduced the requests to batches of 16. I've had to increased the
> number of threads from 30 to 90 in order to get the same key throughput
> because the throughput I measure drastically goes down on a per thread
> basis.
> What I see :
>  - CPU utilization is lower on the first replica (why would that be if the
> batches are smaller ?)
>  - Pending ReadStage on first replica seems to be staying higher longer.
> Still goes down to 0 regularly.
>  - lowering to 60 client threads, I see non-zero active MutationStage and
> ReplicateOnWriteStage more often
> For our use-case, the higher the throughput per client thread, the less
> rework will be done in our processing.
>
> Another experiment : I stopped the process that does all the reading and a
> little of the writing. All that's left is a single-threaded process that
> sending counter updates as fast as it can in batches of up to 50 mutations.
> First replica : pending counts go up into the low hundreds and back to 0,
> active up to 3 or 5 and that's a max. Some mutation stage active & pendings
> => the process is indeed faster at updating the counters so that doesn't
> surprise me given that a counter write requires a read.
> Second & third replicas : no read stage pendings at all. A
> little RequestResponseStage as earlier.
>
> Cheers
> Philippe
>
>>
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 21/12/2011, at 11:57 AM, Philippe wrote:
>>
>> Hello,
>> 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super
>> columns. Read queries are multigetslices of super columns inside of which I
>> read every column for processing (20-30 at most), using Hector with default
>> settings.
>> Watching tpstat on the 3 nodes holding the data being most often queries,
>> I see the pending count increase only on the "main replica" and I see heavy
>> CPU load and network load only on that node. The other nodes seem to be
>> doing very little.
>>
>> Aren't counter read requests supposed to be round-robin across replicas ?
>> I'm confused as to why the nodes don't exhibit the same load.
>>
>> Thanks
>>
>>
>>
>

--e89a8f23443b470f5104b4a18460
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

along the same line of the last experimient I did (cluster is only being up=
dated by a single threaded batching processing.)<div>All nodes are the same=
 hardware &amp; configuration. Why on earth would one node require disk IO =
and not the 2 replicas ?</div>
<div><br><div>Primary replica show some disk activity (iostat shows about 4=
0%)</div><div><font class=3D"Apple-style-span" face=3D"&#39;courier new&#39=
;, monospace">----total-cpu-usage---- -dsk/total-=A0</font></div><div><font=
 class=3D"Apple-style-span" face=3D"&#39;courier new&#39;, monospace">usr s=
ys idl wai hiq siq| read =A0writ</font></div>
<div><font class=3D"Apple-style-span" face=3D"&#39;courier new&#39;, monosp=
ace">67 =A010 =A019 =A0 2 =A0 0 =A0 3|4244k =A0364k|<br></font><br></div><d=
iv>where as 2nd &amp; 3rd replica do not</div><div><div><font class=3D"Appl=
e-style-span" face=3D"&#39;courier new&#39;, monospace">----total-cpu-usage=
---- -dsk/total-=A0</font></div>
<div><font class=3D"Apple-style-span" face=3D"&#39;courier new&#39;, monosp=
ace">usr sys idl wai hiq siq| read =A0writ</font></div></div><div><font cla=
ss=3D"Apple-style-span" face=3D"&#39;courier new&#39;, monospace"><div>42 =
=A013 =A041 =A0 0 =A0 0 =A0 3| =A0 0 =A0 =A0 0 |</div>
<div>=A047 =A015 =A034 =A0 0 =A0 0 =A0 4|4096B =A0185k</div><div>=A049 =A01=
4 =A035 =A0 0 =A0 0 =A0 3| =A0 0 =A08192B</div><div>=A047 =A016 =A033 =A0 0=
 =A0 0 =A0 4| =A0 0 =A04096B</div><div>=A044 =A013 =A041 =A0 0 =A0 0 =A0 3|=
 284k =A0112k</div></font></div><div><br></div><div>3rd</div>
<div><div><font class=3D"Apple-style-span" face=3D"&#39;courier new&#39;, m=
onospace">11 =A0 2 =A087 =A0 1 =A0 0 =A0 0| =A0 0 =A0 136k|</font></div><di=
v><font class=3D"Apple-style-span" face=3D"&#39;courier new&#39;, monospace=
">=A0 0 =A0 0 =A099 =A0 0 =A0 0 =A0 0| =A0 0 =A0 =A0 0=A0</font></div>
<div><font class=3D"Apple-style-span" face=3D"&#39;courier new&#39;, monosp=
ace">=A0 9 =A0 1 =A090 =A0 0 =A0 0 =A0 0|4096B =A0128k</font></div><div><fo=
nt class=3D"Apple-style-span" face=3D"&#39;courier new&#39;, monospace">=A0=
 2 =A0 2 =A096 =A0 0 =A0 0 =A0 0| =A0 0 =A0 =A0 0=A0</font></div>
<div><font class=3D"Apple-style-span" face=3D"&#39;courier new&#39;, monosp=
ace">=A0 0 =A0 0 =A099 =A0 0 =A0 0 =A0 0| =A0 0 =A0 =A0 0=A0</font></div><d=
iv><font class=3D"Apple-style-span" face=3D"&#39;courier new&#39;, monospac=
e">=A011 =A0 1 =A087 =A0 0 =A0 0 =A0 0| =A0 0 =A0 128k</font></div>
<div><br></div><div><br></div><div>Philippe</div><div>2011/12/21 Philippe <=
span dir=3D"ltr">&lt;<a href=3D"mailto:watcherfr@gmail.com">watcherfr@gmail=
.com</a>&gt;</span></div><div class=3D"gmail_quote"><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex">
Hi Aaron,<div class=3D"im"><div><br>&gt;How many rows are you asking for in=
 the multget_slice and what thread pools are showing pending tasks ?</div><=
/div><div>I am querying in batches of 256 keys max. Each batch may slice be=
tween 1 and 5 explicit super columns (I need all the columns in each super =
column, there are at the very most a couple dozen columns per SC).</div>

<div><br></div><div>On the first replica, only ReadStage ever shows any pen=
ding. All the others =A0have 1 to 10 pending from time to time only.=A0Here=
&#39;s a typical &quot;high pending count&quot; reading on the first replic=
a for the data hotspot.</div>

<div><div>ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A013 =A0 =
=A0 =A05238 =A0 =A010374301128 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 0</div><div>I&#39;ve got a watch running every two seconds and I se=
e the numbers vary every time going from that high point to 0 active, 0 pen=
ding.=A0The one thing I&#39;ve noticed is that I hardly every see the Activ=
e count stay up at the current 2s sampling rate.=A0</div>

<div><div>On the 2 other replicas, I hardly ever see any pendings on ReadSt=
age and Active hardly goes up to 1 or 2. But I do see a little PENDING on=
=A0RequestResponseStage, goes up in the tens or hundreds from time to time.=
</div>

</div><div><br></div><div><br></div><div>If I&#39;m flooding that one repli=
ca, shouldn&#39;t the ReadStage Active count be at maximum capacity ?</div>=
<div><br></div><div><br></div><div>I&#39;ve already thought of CASSANDRA-29=
80 but I&#39;m running 0.8.7 and 0.8.9.</div>

<div><br></div><div class=3D"gmail_quote"><div class=3D"im"><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex"><div style=3D"word-wrap:break-word"><div>Also, what happens=
 when you reduce the number of rows in the request?</div>

</div></blockquote></div><div>I&#39;ve reduced the requests to batches of 1=
6. I&#39;ve had to increased the number of threads from 30 to 90 in order t=
o get the same key throughput because the throughput I measure drastically =
goes down on a per thread basis.</div>

<div>What I see :</div><div>=A0- CPU utilization is lower on the first repl=
ica (why would that be if the batches are smaller ?)</div><div>=A0- Pending=
 ReadStage on first replica seems to be staying higher longer. Still goes d=
own to 0 regularly.</div>

<div>=A0- lowering to 60 client threads, I see non-zero active MutationStag=
e and ReplicateOnWriteStage more often</div><div>For our use-case, the high=
er the throughput per client thread, the less rework will be done in our pr=
ocessing.</div>

<div><br></div><div>Another experiment : I stopped the process that does al=
l the reading and a little of the writing. All that&#39;s left is a single-=
threaded process that sending counter updates as fast as it can in batches =
of up to 50 mutations.</div>

<div>First replica : pending counts go up into the low hundreds and back to=
 0, active up to 3 or 5 and that&#39;s a max. Some mutation stage active &a=
mp; pendings =3D&gt; the process is indeed faster at updating the counters =
so that doesn&#39;t surprise me given that a counter write requires a read.=
</div>

<div>Second &amp; third replicas : no read stage pendings at all. A little=
=A0RequestResponseStage as earlier.</div><div><br></div><div>Cheers</div><s=
pan class=3D"HOEnZb"><font color=3D"#888888"><div>Philippe=A0</div></font><=
/span><div class=3D"im">
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word"><div><br></div><div>Cheers</div><div><b=
r><div>
<span style=3D"border-collapse:separate;color:rgb(0,0,0);font-family:Helvet=
ica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing=
:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-tra=
nsform:none;white-space:normal;word-spacing:0px;font-size:medium"><span sty=
le=3D"border-collapse:separate;color:rgb(0,0,0);font-family:Helvetica;font-=
style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;l=
ine-height:normal;text-indent:0px;text-transform:none;white-space:normal;wo=
rd-spacing:0px;font-size:medium"><div style=3D"word-wrap:break-word">

<span style=3D"border-collapse:separate;color:rgb(0,0,0);font-family:Helvet=
ica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing=
:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:=
normal;word-spacing:0px;font-size:medium"><div style=3D"word-wrap:break-wor=
d">

<span style=3D"border-collapse:separate;color:rgb(0,0,0);font-family:Helvet=
ica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing=
:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:=
normal;word-spacing:0px;font-size:medium"><div style=3D"word-wrap:break-wor=
d">

<div><div>-----------------</div><span><font color=3D"#888888"><div>Aaron M=
orton</div><div>Freelance Developer</div><div>@aaronmorton</div><div><a hre=
f=3D"http://www.thelastpickle.com" target=3D"_blank">http://www.thelastpick=
le.com</a></div>

</font></span></div></div></span></div></span></div></span></span>
</div><div><div>

<br><div><div>On 21/12/2011, at 11:57 AM, Philippe wrote:</div><br><blockqu=
ote type=3D"cite">Hello,<div>5 nodes running 0.8.7/0.8.9, RF=3D3, BOP, coun=
ter columns inside super columns.=A0Read queries are multigetslices of supe=
r columns inside of which I read every column for processing (20-30 at most=
), using Hector with default settings.</div>


<div>Watching tpstat on the 3 nodes holding the data being most often queri=
es, I see the pending count increase only on the &quot;main replica&quot; a=
nd I see heavy CPU load and network load only on that node. The other nodes=
 seem to be doing very little.</div>


<div><br></div><div>Aren&#39;t counter read requests supposed to be round-r=
obin across replicas ? I&#39;m confused as to why the nodes don&#39;t exhib=
it the same load.</div><div><br></div><div>Thanks</div>
</blockquote></div><br></div></div></div></div></blockquote></div></div><br=
></div>
</blockquote></div><br></div></div>

--e89a8f23443b470f5104b4a18460--