Return-Path: Delivered-To: apmail-incubator-cassandra-user-archive@minotaur.apache.org Received: (qmail 82745 invoked from network); 16 Feb 2010 07:54:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Feb 2010 07:54:20 -0000 Received: (qmail 65861 invoked by uid 500); 16 Feb 2010 07:54:20 -0000 Delivered-To: apmail-incubator-cassandra-user-archive@incubator.apache.org Received: (qmail 65844 invoked by uid 500); 16 Feb 2010 07:54:20 -0000 Mailing-List: contact cassandra-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-user@incubator.apache.org Delivered-To: mailing list cassandra-user@incubator.apache.org Received: (qmail 65835 invoked by uid 99); 16 Feb 2010 07:54:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Feb 2010 07:54:20 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of weijunli@gmail.com designates 209.85.210.193 as permitted sender) Received: from [209.85.210.193] (HELO mail-yx0-f193.google.com) (209.85.210.193) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Feb 2010 07:54:10 +0000 Received: by yxe31 with SMTP id 31so4637185yxe.21 for ; Mon, 15 Feb 2010 23:53:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:to:references :in-reply-to:subject:date:message-id:mime-version:content-type :content-transfer-encoding:x-mailer:thread-index:content-language; bh=QjPxdD9kFRd//Frhkdxy1MlTPfkKXM8bA+nUX6ENOPQ=; b=r1PRFFVgO8b3ynV/2Ch9RFAjYiOZVt9muNrjXZnX12fA5X30p9aqdpijZcalHf4Lzz tCUfy08BngPCIhFNrJJbIo9Kd5g+Kf8IM8qcKrW+fCHnAA8Owl612ls4KvR6kEtA1eeq yRWNpGw9Wbnr6LsiLLWu/Re1EJvq2Cq3waBdw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-type:content-transfer-encoding:x-mailer:thread-index :content-language; b=f12PihUQzhSI2cUNELj/BQtFiGI0a69i53IAxdjaDYYtmz+r2OXt63Tny8dKTsTcvD O/I8tQ7/DmbFWmORVTcAz3FZObUs0Q5n3EW0P4rNxqn3lz5jjLjeeM8UtEHyuES3vh59 sXnzNHvaqS+ZG4c//CUSXvJxI4q7+2kcnMMGI= Received: by 10.101.180.21 with SMTP id h21mr3883583anp.26.1266306829689; Mon, 15 Feb 2010 23:53:49 -0800 (PST) Received: from WaynePC (173-11-95-78-SFBA.hfc.comcastbusiness.net [173.11.95.78]) by mx.google.com with ESMTPS id 23sm2646368yxe.0.2010.02.15.23.53.46 (version=SSLv3 cipher=RC4-MD5); Mon, 15 Feb 2010 23:53:47 -0800 (PST) From: "Weijun Li" To: References: <468b21171001200244n2521e77esa84964946f0eb20b@mail.gmail.com> <468b21171001210208r75c04df2of5c63e87644b399@mail.gmail.com> <468b21171001210209m6435f6f2g5dcc7ca2d94ecb55@mail.gmail.com> <468b21171001240054tb7757va64fdb54824854fe@mail.gmail.com> <468b21171001240220u3414109dj1560fbd65b82ecfa@mail.gmail.com> <012601caade5$21a79fb0$64f6df10$@com> In-Reply-To: Subject: RE: Cassandra benchmark shows OK throughput but high read latency (> 100ms)? Date: Mon, 15 Feb 2010 23:53:46 -0800 Message-ID: <005101caaedd$2bb3ce90$831b6bb0$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Acqt5cxKVzEtBU8rQQ6/2ljvuVEsJwA8+GTA Content-Language: en-us X-Virus-Checked: Checked by ClamAV on apache.org It seems that read latency is sensitive to number of threads (or thrift clients): after reducing number of threads to 15 and read latency decreased to ~20ms. The other problem is: if I keep mixed write and read (e.g, 8 write threads plus 7 read threads) against the 2-nodes cluster continuously, the read latency will go up gradually (along with the size of Cassandra data file), and at the end it will become ~40ms (up from ~20ms) even with only 15 threads. During this process the data file grew from 1.6GB to over 3GB even if I kept writing the same key/values to Cassandra. It seems that Cassandra keeps appending to sstable data files and will only clean up them during node cleanup or compact (please correct me if this is incorrect). Here's my test settings: JVM xmx: 6GB KCF: 0.3 Memtable: 512MB. Number of records: 1 millon (payload is 1000 bytes) I used JMX and iostat to watch the cluster but can't find any clue for the increasing read latency issue: JVM memory, GC, CPU usage, tpstats and io saturation all seem to be clean. One exception is that the wait time in iostat goes up quickly once a while but is a small number for most of the time. Another thing I noticed is that JVM doesn't use more than 1GB of memory (out of the 6GB I specified for JVM) even if I set KCF to 0.3 and increased memtable size to 512MB. Did I miss anything here? How can I diagnose this kind of increasing read latency issue? Is there any performance tuning guide available? Thanks, -Weijun -----Original Message----- From: Jonathan Ellis [mailto:jbellis@gmail.com] Sent: Sunday, February 14, 2010 6:22 PM To: cassandra-user@incubator.apache.org Subject: Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)? are you i/o bound? what is your on-disk data set size? what does iostats tell you? http://spyced.blogspot.com/2010/01/linux-performance-basics.html do you have a lot of pending compactions? (tpstats will tell you) have you increased KeysCachedFraction? On Sun, Feb 14, 2010 at 8:18 PM, Weijun Li wrote: > Hello, > > > > I saw some Cassandra benchmark reports mentioning read latency that is less > than 50ms or even 30ms. But my benchmark with 0.5 doesn't seem to support > that. Here's my settings: > > > > Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM > > ReplicationFactor=2 Partitioner=Random > > JVM Xmx: 4GB > > Memory table size: 512MB (haven't figured out how to enable binary memtable > so I set both memtable number to 512mb) > > Flushing threads: 2-4 > > Payload: ~1000 bytes, 3 columns in one CF. > > Read/write time measure: get startTime right before each Java thrift call, > transport objects are pre-created upon creation of each thread. > > > > The result shows that total write throughput is around 2000/sec (for 2 nodes > in the cluster) which is not bad, and read throughput is just around > 750/sec. However for each thread the average read latency is more than > 100ms. I'm running 100 threads for the testing and each thread randomly pick > a node for thrift call. So the read/sec of each thread is just around 7.5, > meaning duration of each thrift call is 1000/7.5=133ms. Without replication > the cluster write throughput is around 3300/s, and read throughput is around > 1400/s, so the read latency is still around 70ms without replication. > > > > Is there anything wrong in my benchmark test? How can I achieve a reasonable > read latency (< 30ms)? > > > > Thanks, > > -Weijun > > > >