Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 35DA475AA for ; Wed, 21 Dec 2011 22:15:38 +0000 (UTC) Received: (qmail 26701 invoked by uid 500); 21 Dec 2011 22:15:36 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 26652 invoked by uid 500); 21 Dec 2011 22:15:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 26644 invoked by uid 99); 21 Dec 2011 22:15:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2011 22:15:35 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of watcherfr@gmail.com designates 74.125.82.42 as permitted sender) Received: from [74.125.82.42] (HELO mail-ww0-f42.google.com) (74.125.82.42) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2011 22:15:28 +0000 Received: by wgbds13 with SMTP id ds13so11244614wgb.1 for ; Wed, 21 Dec 2011 14:15:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=rvOstGWjzlAzRC9tNBe4HR0nbVi9Ta6nDU/mHwxeTrg=; b=l+mbJxdtgeJ7zcxK33G/C9j8Z+LxX4PUign9XWiesc7u44orGSxgg8ZKaAsN0rjGun ycPfDvXmEh2txtOK4rUVOM29g6BVtdeKZIZwxbtnON8gQjYPTwGC9j6xtVyYoADHCn2Q LJCMUPA7wmD2zvRC1rP2O1T6VEOWDvoC6ikvM= MIME-Version: 1.0 Received: by 10.180.107.134 with SMTP id hc6mr15367644wib.21.1324505708826; Wed, 21 Dec 2011 14:15:08 -0800 (PST) Received: by 10.180.100.135 with HTTP; Wed, 21 Dec 2011 14:15:08 -0800 (PST) In-Reply-To: References: <03AF5DDE-7AA1-474F-8491-65D1F4A1D0F4@thelastpickle.com> Date: Wed, 21 Dec 2011 23:15:08 +0100 Message-ID: Subject: Re: Counter read requests spread across replicas ? From: Philippe To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=e89a8f23443b470f5104b4a18460 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f23443b470f5104b4a18460 Content-Type: text/plain; charset=ISO-8859-1 along the same line of the last experimient I did (cluster is only being updated by a single threaded batching processing.) All nodes are the same hardware & configuration. Why on earth would one node require disk IO and not the 2 replicas ? Primary replica show some disk activity (iostat shows about 40%) ----total-cpu-usage---- -dsk/total- usr sys idl wai hiq siq| read writ 67 10 19 2 0 3|4244k 364k| where as 2nd & 3rd replica do not ----total-cpu-usage---- -dsk/total- usr sys idl wai hiq siq| read writ 42 13 41 0 0 3| 0 0 | 47 15 34 0 0 4|4096B 185k 49 14 35 0 0 3| 0 8192B 47 16 33 0 0 4| 0 4096B 44 13 41 0 0 3| 284k 112k 3rd 11 2 87 1 0 0| 0 136k| 0 0 99 0 0 0| 0 0 9 1 90 0 0 0|4096B 128k 2 2 96 0 0 0| 0 0 0 0 99 0 0 0| 0 0 11 1 87 0 0 0| 0 128k Philippe 2011/12/21 Philippe > Hi Aaron, > > >How many rows are you asking for in the multget_slice and what thread > pools are showing pending tasks ? > I am querying in batches of 256 keys max. Each batch may slice between 1 > and 5 explicit super columns (I need all the columns in each super column, > there are at the very most a couple dozen columns per SC). > > On the first replica, only ReadStage ever shows any pending. All the > others have 1 to 10 pending from time to time only. Here's a typical "high > pending count" reading on the first replica for the data hotspot. > ReadStage 13 5238 10374301128 0 > 0 > I've got a watch running every two seconds and I see the numbers vary > every time going from that high point to 0 active, 0 pending. The one thing > I've noticed is that I hardly every see the Active count stay up at the > current 2s sampling rate. > On the 2 other replicas, I hardly ever see any pendings on ReadStage and > Active hardly goes up to 1 or 2. But I do see a little PENDING > on RequestResponseStage, goes up in the tens or hundreds from time to time. > > > If I'm flooding that one replica, shouldn't the ReadStage Active count be > at maximum capacity ? > > > I've already thought of CASSANDRA-2980 but I'm running 0.8.7 and 0.8.9. > > Also, what happens when you reduce the number of rows in the request? >> > I've reduced the requests to batches of 16. I've had to increased the > number of threads from 30 to 90 in order to get the same key throughput > because the throughput I measure drastically goes down on a per thread > basis. > What I see : > - CPU utilization is lower on the first replica (why would that be if the > batches are smaller ?) > - Pending ReadStage on first replica seems to be staying higher longer. > Still goes down to 0 regularly. > - lowering to 60 client threads, I see non-zero active MutationStage and > ReplicateOnWriteStage more often > For our use-case, the higher the throughput per client thread, the less > rework will be done in our processing. > > Another experiment : I stopped the process that does all the reading and a > little of the writing. All that's left is a single-threaded process that > sending counter updates as fast as it can in batches of up to 50 mutations. > First replica : pending counts go up into the low hundreds and back to 0, > active up to 3 or 5 and that's a max. Some mutation stage active & pendings > => the process is indeed faster at updating the counters so that doesn't > surprise me given that a counter write requires a read. > Second & third replicas : no read stage pendings at all. A > little RequestResponseStage as earlier. > > Cheers > Philippe > >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Developer >> @aaronmorton >> http://www.thelastpickle.com >> >> On 21/12/2011, at 11:57 AM, Philippe wrote: >> >> Hello, >> 5 nodes running 0.8.7/0.8.9, RF=3, BOP, counter columns inside super >> columns. Read queries are multigetslices of super columns inside of which I >> read every column for processing (20-30 at most), using Hector with default >> settings. >> Watching tpstat on the 3 nodes holding the data being most often queries, >> I see the pending count increase only on the "main replica" and I see heavy >> CPU load and network load only on that node. The other nodes seem to be >> doing very little. >> >> Aren't counter read requests supposed to be round-robin across replicas ? >> I'm confused as to why the nodes don't exhibit the same load. >> >> Thanks >> >> >> > --e89a8f23443b470f5104b4a18460 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable along the same line of the last experimient I did (cluster is only being up= dated by a single threaded batching processing.)
All nodes are the same= hardware & configuration. Why on earth would one node require disk IO = and not the 2 replicas ?

Primary replica show some disk activity (iostat shows about 4= 0%)
----total-cpu-usage---- -dsk/total-=A0
usr s= ys idl wai hiq siq| read =A0writ
67 =A010 =A019 =A0 2 =A0 0 =A0 3|4244k =A0364k|

where as 2nd & 3rd replica do not
----total-cpu-usage= ---- -dsk/total-=A0
usr sys idl wai hiq siq| read =A0writ
42 = =A013 =A041 =A0 0 =A0 0 =A0 3| =A0 0 =A0 =A0 0 |
=A047 =A015 =A034 =A0 0 =A0 0 =A0 4|4096B =A0185k
=A049 =A01= 4 =A035 =A0 0 =A0 0 =A0 3| =A0 0 =A08192B
=A047 =A016 =A033 =A0 0= =A0 0 =A0 4| =A0 0 =A04096B
=A044 =A013 =A041 =A0 0 =A0 0 =A0 3|= 284k =A0112k

3rd
11 =A0 2 =A087 =A0 1 =A0 0 =A0 0| =A0 0 =A0 136k|
=A0 9 =A0 1 =A090 =A0 0 =A0 0 =A0 0|4096B =A0128k
=A0= 2 =A0 2 =A096 =A0 0 =A0 0 =A0 0| =A0 0 =A0 =A0 0=A0
=A0 0 =A0 0 =A099 =A0 0 =A0 0 =A0 0| =A0 0 =A0 =A0 0=A0
=A011 =A0 1 =A087 =A0 0 =A0 0 =A0 0| =A0 0 =A0 128k


Philippe
2011/12/21 Philippe <= span dir=3D"ltr"><watcherfr@gmail= .com>
Hi Aaron,

>How many rows are you asking for in= the multget_slice and what thread pools are showing pending tasks ?
<= /div>
I am querying in batches of 256 keys max. Each batch may slice be= tween 1 and 5 explicit super columns (I need all the columns in each super = column, there are at the very most a couple dozen columns per SC).

On the first replica, only ReadStage ever shows any pen= ding. All the others =A0have 1 to 10 pending from time to time only.=A0Here= 's a typical "high pending count" reading on the first replic= a for the data hotspot.
ReadStage =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A013 =A0 = =A0 =A05238 =A0 =A010374301128 =A0 =A0 =A0 =A0 0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 0
I've got a watch running every two seconds and I se= e the numbers vary every time going from that high point to 0 active, 0 pen= ding.=A0The one thing I've noticed is that I hardly every see the Activ= e count stay up at the current 2s sampling rate.=A0
On the 2 other replicas, I hardly ever see any pendings on ReadSt= age and Active hardly goes up to 1 or 2. But I do see a little PENDING on= =A0RequestResponseStage, goes up in the tens or hundreds from time to time.=


If I'm flooding that one repli= ca, shouldn't the ReadStage Active count be at maximum capacity ?
=


I've already thought of CASSANDRA-29= 80 but I'm running 0.8.7 and 0.8.9.

Also, what happens= when you reduce the number of rows in the request?
I've reduced the requests to batches of 1= 6. I've had to increased the number of threads from 30 to 90 in order t= o get the same key throughput because the throughput I measure drastically = goes down on a per thread basis.
What I see :
=A0- CPU utilization is lower on the first repl= ica (why would that be if the batches are smaller ?)
=A0- Pending= ReadStage on first replica seems to be staying higher longer. Still goes d= own to 0 regularly.
=A0- lowering to 60 client threads, I see non-zero active MutationStag= e and ReplicateOnWriteStage more often
For our use-case, the high= er the throughput per client thread, the less rework will be done in our pr= ocessing.

Another experiment : I stopped the process that does al= l the reading and a little of the writing. All that's left is a single-= threaded process that sending counter updates as fast as it can in batches = of up to 50 mutations.
First replica : pending counts go up into the low hundreds and back to= 0, active up to 3 or 5 and that's a max. Some mutation stage active &a= mp; pendings =3D> the process is indeed faster at updating the counters = so that doesn't surprise me given that a counter write requires a read.=
Second & third replicas : no read stage pendings at all. A little= =A0RequestResponseStage as earlier.

Cheers
Philippe=A0
<= /span>

Cheers
-----------------
Aaron M= orton
Freelance Developer
@aaronmorton

On 21/12/2011, at 11:57 AM, Philippe wrote:

Hello,
5 nodes running 0.8.7/0.8.9, RF=3D3, BOP, coun= ter columns inside super columns.=A0Read queries are multigetslices of supe= r columns inside of which I read every column for processing (20-30 at most= ), using Hector with default settings.
Watching tpstat on the 3 nodes holding the data being most often queri= es, I see the pending count increase only on the "main replica" a= nd I see heavy CPU load and network load only on that node. The other nodes= seem to be doing very little.

Aren't counter read requests supposed to be round-r= obin across replicas ? I'm confused as to why the nodes don't exhib= it the same load.

Thanks


--e89a8f23443b470f5104b4a18460--