Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: cassandra-user@incubator.apache.org
Received-SPF: pass (nike.apache.org: domain of weijunli@gmail.com designates
 209.85.210.193 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=from:to:references:in-reply-to:subject:date:message-id:mime-version
         :content-type:content-transfer-encoding:x-mailer:thread-index
         :content-language;
        b=f12PihUQzhSI2cUNELj/BQtFiGI0a69i53IAxdjaDYYtmz+r2OXt63Tny8dKTsTcvD
         O/I8tQ7/DmbFWmORVTcAz3FZObUs0Q5n3EW0P4rNxqn3lz5jjLjeeM8UtEHyuES3vh59
         sXnzNHvaqS+ZG4c//CUSXvJxI4q7+2kcnMMGI=
From: "Weijun Li" <weijunli@gmail.com>
To: <cassandra-user@incubator.apache.org>
References: <468b21171001200244n2521e77esa84964946f0eb20b@mail.gmail.com>
 	<e06563881001200547qf4c64f8uce9debbb99d65a35@mail.gmail.com>
 	<468b21171001210208r75c04df2of5c63e87644b399@mail.gmail.com>
 	<468b21171001210209m6435f6f2g5dcc7ca2d94ecb55@mail.gmail.com>
 	<468b21171001240054tb7757va64fdb54824854fe@mail.gmail.com>
 	<e4b4609b1001240202y37de7862v21981754d69bdd09@mail.gmail.com>
 	<468b21171001240220u3414109dj1560fbd65b82ecfa@mail.gmail.com>
 	<e4b4609b1002132210h4b79bc52mb50bc7eb53e2ad88@mail.gmail.com>
 	<012601caade5$21a79fb0$64f6df10$@com>
 <e06563881002141822n230503f6m17ec3afca0e4a569@mail.gmail.com>
In-Reply-To: <e06563881002141822n230503f6m17ec3afca0e4a569@mail.gmail.com>
Subject: RE: Cassandra benchmark shows OK throughput but high read latency (>
 	100ms)?
Date: Mon, 15 Feb 2010 23:53:46 -0800
Message-ID: <005101caaedd$2bb3ce90$831b6bb0$@com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
Thread-Index: Acqt5cxKVzEtBU8rQQ6/2ljvuVEsJwA8+GTA
Content-Language: en-us

It seems that read latency is sensitive to number of threads (or thrift
clients): after reducing number of threads to 15 and read latency decreased
to ~20ms. 

The other problem is: if I keep mixed write and read (e.g, 8 write threads
plus 7 read threads) against the 2-nodes cluster continuously, the read
latency will go up gradually (along with the size of Cassandra data file),
and at the end it will become ~40ms (up from ~20ms) even with only 15
threads. During this process the data file grew from 1.6GB to over 3GB even
if I kept writing the same key/values to Cassandra. It seems that Cassandra
keeps appending to sstable data files and will only clean up them during
node cleanup or compact (please correct me if this is incorrect). 
 
Here's my test settings:

JVM xmx: 6GB
KCF: 0.3
Memtable: 512MB.
Number of records: 1 millon (payload is 1000 bytes)

I used JMX and iostat to watch the cluster but can't find any clue for the
increasing read latency issue: JVM memory, GC, CPU usage, tpstats and io
saturation all seem to be clean. One exception is that the wait time in
iostat goes up quickly once a while but is a small number for most of the
time. Another thing I noticed is that JVM doesn't use more than 1GB of
memory (out of the 6GB I specified for JVM) even if I set KCF to 0.3 and
increased memtable size to 512MB.

Did I miss anything here? How can I diagnose this kind of increasing read
latency issue? Is there any performance tuning guide available?

Thanks,
-Weijun


-----Original Message-----
From: Jonathan Ellis [mailto:jbellis@gmail.com] 
Sent: Sunday, February 14, 2010 6:22 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra benchmark shows OK throughput but high read latency
(> 100ms)?

are you i/o bound?  what is your on-disk data set size?  what does
iostats tell you?
http://spyced.blogspot.com/2010/01/linux-performance-basics.html

do you have a lot of pending compactions?  (tpstats will tell you)

have you increased KeysCachedFraction?

On Sun, Feb 14, 2010 at 8:18 PM, Weijun Li <weijunli@gmail.com> wrote:
> Hello,
>
>
>
> I saw some Cassandra benchmark reports mentioning read latency that is
less
> than 50ms or even 30ms. But my benchmark with 0.5 doesn't seem to support
> that. Here's my settings:
>
>
>
> Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM
>
> ReplicationFactor=2 Partitioner=Random
>
> JVM Xmx: 4GB
>
> Memory table size: 512MB (haven't figured out how to enable binary
memtable
> so I set both memtable number to 512mb)
>
> Flushing threads: 2-4
>
> Payload: ~1000 bytes, 3 columns in one CF.
>
> Read/write time measure: get startTime right before each Java thrift call,
> transport objects are pre-created upon creation of each thread.
>
>
>
> The result shows that total write throughput is around 2000/sec (for 2
nodes
> in the cluster) which is not bad, and read throughput is just around
> 750/sec. However for each thread the average read latency is more than
> 100ms. I'm running 100 threads for the testing and each thread randomly
pick
> a node for thrift call. So the read/sec of each thread is just around 7.5,
> meaning duration of each thrift call is 1000/7.5=133ms. Without
replication
> the cluster write throughput is around 3300/s, and read throughput is
around
> 1400/s, so the read latency is still around 70ms without replication.
>
>
>
> Is there anything wrong in my benchmark test? How can I achieve a
reasonable
> read latency (< 30ms)?
>
>
>
> Thanks,
>
> -Weijun
>
>
>
>