Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: "Freeman, Tim" <tim.freeman@hp.com>
To: "cassandra-user@incubator.apache.org"
	<cassandra-user@incubator.apache.org>
Date: Fri, 4 Dec 2009 18:49:25 +0000
Subject: RE: Persistently increasing read latency
Thread-Topic: Persistently increasing read latency
Thread-Index: Acp0bUcU3e5O8Iw7QvKjl0UtYGz1cwApB9ZQ
Message-ID: 
 <59DD1BA8FD3C0F4C90771C18F2B5B53A4C85126629@GVW0432EXB.americas.hpqcorp.net>
References: 
 <59DD1BA8FD3C0F4C90771C18F2B5B53A4C850190C6@GVW0432EXB.americas.hpqcorp.net>
 	<59DD1BA8FD3C0F4C90771C18F2B5B53A4C850C20C5@GVW0432EXB.americas.hpqcorp.net>
 	<e06563880912031102n1d99ee5aoe666433b797d0a5f@mail.gmail.com>
 	<59DD1BA8FD3C0F4C90771C18F2B5B53A4C850C20F4@GVW0432EXB.americas.hpqcorp.net>
 	<e06563880912031114y13cf6859n4e95eb59873cc82b@mail.gmail.com>
 	<59DD1BA8FD3C0F4C90771C18F2B5B53A4C850C228B@GVW0432EXB.americas.hpqcorp.net>
 	<e06563880912031418r102ddf82w83733281675722c0@mail.gmail.com>
 	<59DD1BA8FD3C0F4C90771C18F2B5B53A4C850C22D8@GVW0432EXB.americas.hpqcorp.net>
 	<e06563880912031445n14917472if374d12fcb5c5121@mail.gmail.com>
 	<59DD1BA8FD3C0F4C90771C18F2B5B53A4C850C2317@GVW0432EXB.americas.hpqcorp.net>
 <e06563880912031505xc2cc284j7b92202aae32aff4@mail.gmail.com>
In-Reply-To: <e06563880912031505xc2cc284j7b92202aae32aff4@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

The speed of compaction isn't the problem.  The problem is that lots of rea=
ds and writes cause compaction to fall behind.

You could solve the problem by throttling reads and writes so compaction is=
n't starved.  (Maybe just the writes.  I'm not sure.)

Different nodes will have different compaction backlogs, so you'd want to d=
o this on a per node basis after Cassandra has made decisions about whateve=
r replication it's going to do.  For example, Cassandra could observe the n=
umber of pending compaction tasks and sleep that many milliseconds before e=
very read and write.

The status quo is that I have to count a load test as passing only if the a=
mount of backlogged compaction work stays less than some bound.  I'd rather=
 not have to peer into Cassandra internals to determine whether it's really=
 working or not.  It's a problem if 16 hour load tests get different result=
s than 1 hour load tests because in my tests I'm renting a cluster by the h=
our.

Tim Freeman
Email: tim.freeman@hp.com
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and Thur=
sday; call my desk instead.)

-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com]=20
Sent: Thursday, December 03, 2009 3:06 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Persistently increasing read latency

Thanks for looking into this.  Doesn't seem like there's much
low-hanging fruit to make compaction faster but I'll keep that in the
back of my mind.

-Jonathan

On Thu, Dec 3, 2009 at 4:58 PM, Freeman, Tim <tim.freeman@hp.com> wrote:
>>So this is working as designed, but the design is poor because it
>>causes confusion. =A0If you can open a ticket for this that would be
>>great.
>
> Done, see:
>
> =A0 https://issues.apache.org/jira/browse/CASSANDRA-599
>
>>What does iostat -x 10 (for instance) say about the disk activity?
>
> rkB/s is consistently high, and wkB/s varies. =A0This is a typical entry =
with wkB/s at the high end of its range:
>
>>avg-cpu: =A0%user =A0 %nice =A0 =A0%sys %iowait =A0 %idle
>> =A0 =A0 =A0 =A0 =A0 1.52 =A0 =A00.00 =A0 =A01.70 =A0 27.49 =A0 69.28
>>
>>Device: =A0 =A0rrqm/s wrqm/s =A0 r/s =A0 w/s =A0rsec/s =A0wsec/s =A0 =A0r=
kB/s =A0 =A0wkB/s avgrq-sz avgqu-sz =A0 await =A0svctm =A0%util
>>sda =A0 =A0 =A0 =A0 =A03.10 3249.25 124.08 29.67 26299.30 26288.11 13149.=
65 13144.06 =A0 342.04 =A0 =A017.75 =A0 92.25 =A0 5.98 =A091.92
>>sda1 =A0 =A0 =A0 =A0 0.00 =A0 0.00 =A00.00 =A00.00 =A0 =A00.00 =A0 =A00.0=
0 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A00.00 =A0 0.00 =
=A0 0.00
>>sda2 =A0 =A0 =A0 =A0 3.10 3249.25 124.08 29.67 26299.30 26288.11 13149.65=
 13144.06 =A0 342.04 =A0 =A017.75 =A0 92.25 =A0 5.98 =A091.92
>>sda3 =A0 =A0 =A0 =A0 0.00 =A0 0.00 =A00.00 =A00.00 =A0 =A00.00 =A0 =A00.0=
0 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A00.00 =A0 0.00 =
=A0 0.00
>
> and at the low end:
>
>>avg-cpu: =A0%user =A0 %nice =A0 =A0%sys %iowait =A0 %idle
>> =A0 =A0 =A0 =A0 =A0 1.50 =A0 =A00.00 =A0 =A01.77 =A0 25.80 =A0 70.93
>>
>>Device: =A0 =A0rrqm/s wrqm/s =A0 r/s =A0 w/s =A0rsec/s =A0wsec/s =A0 =A0r=
kB/s =A0 =A0wkB/s avgrq-sz avgqu-sz =A0 await =A0svctm =A0%util
>>sda =A0 =A0 =A0 =A0 =A03.40 817.10 128.60 17.70 27828.80 6600.00 13914.40=
 =A03300.00 =A0 235.33 =A0 =A0 6.13 =A0 56.63 =A0 6.21 =A090.81
>>sda1 =A0 =A0 =A0 =A0 0.00 =A0 0.00 =A00.00 =A00.00 =A0 =A00.00 =A0 =A00.0=
0 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A00.00 =A0 0.00 =
=A0 0.00
>>sda2 =A0 =A0 =A0 =A0 3.40 817.10 128.60 17.70 27828.80 6600.00 13914.40 =
=A03300.00 =A0 235.33 =A0 =A0 6.13 =A0 56.63 =A0 6.21 =A090.81
>>sda3 =A0 =A0 =A0 =A0 0.00 =A0 0.00 =A00.00 =A00.00 =A0 =A00.00 =A0 =A00.0=
0 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A0 0.00 =A0 =A00.00 =A0 0.00 =
=A0 0.00
>
> Tim Freeman
> Email: tim.freeman@hp.com
> Desk in Palo Alto: (650) 857-2581
> Home: (408) 774-1298
> Cell: (408) 348-7536 (No reception business hours Monday, Tuesday, and Th=
ursday; call my desk instead.)
>
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbellis@gmail.com]
> Sent: Thursday, December 03, 2009 2:45 PM
> To: cassandra-user@incubator.apache.org
> Subject: Re: Persistently increasing read latency
>
> On Thu, Dec 3, 2009 at 4:34 PM, Freeman, Tim <tim.freeman@hp.com> wrote:
>>>Can you tell if the system is i/o or cpu bound during compaction?
>>
>> It's I/O bound. =A0It's using ~9% of 1 of 4 cores as I watch it, and all=
 it's doing right now is compactions.
>
> What does iostat -x 10 (for instance) say about the disk activity?
>