Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (nike.apache.org: domain of jbellis@gmail.com designates
 74.125.78.144 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=CHlvd4fF7t473SQf1QSCaYgPJW2yCVJJKURs2igRkw73YoMp+lu/yGFyOLIuPfQXeT
         fzpbJR3MQluCXkLS8r8ur5n7S4wNoYPdyrW6FCl6mUN71TcnMWGXJdUVicixdtMjZfW2
         +Y7f6W9C7zEPAw9NZTwHGhhIpTi62KpRrkypY=
MIME-Version: 1.0
In-Reply-To: <ad2266761002161615o155d81e9jbaefe6fd9da979ac@mail.gmail.com>
References: <468b21171001200244n2521e77esa84964946f0eb20b@mail.gmail.com>
	<005101caaedd$2bb3ce90$831b6bb0$@com> <1NhIrQ-0007Jt-In@mail.eleven.de>
	<cdc5ad201002160606o5a6ecf3fl1ec5d7eb0d2e2e0@mail.gmail.com>
	<ad2266761002160950q3f4c48afv6153e06a5f2e3b2a@mail.gmail.com>
	<ad2266761002160956y6657f7as65c2315ae8a81bf7@mail.gmail.com>
	<cdc5ad201002161001t1c8e5409kd2d308819e009e39@mail.gmail.com>
	<ad2266761002161016v2a1cf4b8r69d3057c4a77d70@mail.gmail.com>
	<1266345093.617624017@192.168.2.229>
 <ad2266761002161615o155d81e9jbaefe6fd9da979ac@mail.gmail.com>
From: Jonathan Ellis <jbellis@gmail.com>
Date: Tue, 16 Feb 2010 19:18:43 -0600
Message-ID: <e06563881002161718w50884d26o27e25484d4dbc059@mail.gmail.com>
Subject: Re: Cassandra benchmark shows OK throughput but high read latency (>
	100ms)?
To: cassandra-user@incubator.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Have you tried increasing KeysCachedFraction?

On Tue, Feb 16, 2010 at 6:15 PM, Weijun Li <weijunli@gmail.com> wrote:
> Still have high read latency with 50mil records in the 2-node cluster
> (replica 2). I restarted both nodes but read latency is still above 60ms =
and
> disk i/o saturation is high. Tried compact and repair but doesn't help mu=
ch.
> When I reduced the client threads from 15 to 5 it looks a lot better but
> throughput is kind of low. I changed using flushing thread of 16 instead =
the
> defaulted 8, could that cause the disk saturation issue?
>
> For benchmark with decent throughput and latency, how many client threads=
 do
> they use? Can anyone share your storage-conf.xml in well-tuned high volum=
e
> cluster?
>
> -Weijun
>
> On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood <stu.hood@rackspace.com> wrote=
:
>>
>> > After I ran "nodeprobe compact" on node B its read latency went up to
>> > 150ms.
>> The compaction process can take a while to finish... in 0.5 you need to
>> watch the logs to figure out when it has actually finished, and then you
>> should start seeing the improvement in read latency.
>>
>> > Is there any way to utilize all of the heap space to decrease the read
>> > latency?
>> In 0.5 you can adjust the number of keys that are cached by changing the
>> 'KeysCachedFraction' parameter in your config file. In 0.6 you can
>> additionally cache rows. You don't want to use up all of the memory on y=
our
>> box for those caches though: you'll want to leave at least 50% for your =
OS's
>> disk cache, which will store the full row content.
>>
>>
>> -----Original Message-----
>> From: "Weijun Li" <weijunli@gmail.com>
>> Sent: Tuesday, February 16, 2010 12:16pm
>> To: cassandra-user@incubator.apache.org
>> Subject: Re: Cassandra benchmark shows OK throughput but high read laten=
cy
>> (> 100ms)?
>>
>> Thanks for for DataFileDirectory trick and I'll give a try.
>>
>> Just noticed the impact of number of data files: node A has 13 data file=
s
>> with read latency of 20ms and node B has 27 files with read latency of
>> 60ms.
>> After I ran "nodeprobe compact" on node B its read latency went up to
>> 150ms.
>> The read latency of node A became as low as 10ms. Is this normal behavio=
r?
>> I'm using random partitioner and the hardware/JVM settings are exactly t=
he
>> same for these two nodes.
>>
>> Another problem is that Java heap usage is always 900mb out of 6GB? Is
>> there
>> any way to utilize all of the heap space to decrease the read latency?
>>
>> -Weijun
>>
>> On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams <driftx@gmail.com>
>> wrote:
>>
>> > On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li <weijunli@gmail.com> wrote=
:
>> >
>> >> One more thoughts about Martin's suggestion: is it possible to put th=
e
>> >> data files into multiple directories that are located in different
>> >> physical
>> >> disks? This should help to improve the i/o bottleneck issue.
>> >>
>> >>
>> > Yes, you can already do this, just add more <DataFileDirectory>
>> > directives
>> > pointed at multiple drives.
>> >
>> >
>> >> Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
>> >
>> >
>> > Row cache and key cache both help tremendously if your read pattern ha=
s
>> > a
>> > decent repeat rate. =A0Completely random io can only be so fast, howev=
er.
>> >
>> > -Brandon
>> >
>>
>>
>
>