Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 12427 invoked from network); 19 Feb 2011 08:53:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Feb 2011 08:53:57 -0000 Received: (qmail 16540 invoked by uid 500); 19 Feb 2011 08:53:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16341 invoked by uid 500); 19 Feb 2011 08:53:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16333 invoked by uid 99); 19 Feb 2011 08:53:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Feb 2011 08:53:51 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of markusklems@gmail.com designates 209.85.160.172 as permitted sender) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Feb 2011 08:53:47 +0000 Received: by gyc15 with SMTP id 15so50363gyc.31 for ; Sat, 19 Feb 2011 00:53:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:from :date:x-google-sender-auth:message-id:subject:to:content-type :content-transfer-encoding; bh=tSXu1eCL+Y130ASQMWeu4o+qLHjlnDk8nCW4laRlnxI=; b=oyGq1arTFwb21dbwcxQrjERWe+FXKpI2dj+gQr9E1xFTA9GlSDQknBXiVvzYtmdzdl c7aW0qvIPUB3pzxwJRjguZMp2WFdcFFXGpSFH6veAACXR2TjYHUDLer0VNxNmLFoTORZ MMHBU74Uxpx5hF6YoTLCXK4SWJ+MIM9AmvIao= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:content-type :content-transfer-encoding; b=aB+SHipRjR36TBHTYBsjpo6pH0S6nyYgybHRXporV785Js+xzyHTiJTefSd63rCLIE Z7374Lrd5UkTChKrJ+3AjPJ1B8KKAW/xNhZrhIYk1y87JBB8ObRr2fCVj5UB1q7Ccz71 Hwb+dp+6QaYGQF5qU36T5pKrXbueB6s0kviMY= Received: by 10.150.140.10 with SMTP id n10mr2178393ybd.440.1298105606110; Sat, 19 Feb 2011 00:53:26 -0800 (PST) MIME-Version: 1.0 Sender: markusklems@gmail.com Received: by 10.150.203.21 with HTTP; Sat, 19 Feb 2011 00:53:06 -0800 (PST) In-Reply-To: References: From: Markus Klems Date: Sat, 19 Feb 2011 09:53:06 +0100 X-Google-Sender-Auth: gu1cxOM9Gcmvx2rEFJjf7b6OTr8 Message-ID: Subject: Re: Benchmarking Cassandra with YCSB To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, we sorted out the performance problems and tuned the cluster. In particular, we identified the following weak spot in our setup: ConcurrentReads and ConcurrentWrites was set to the default values which were much too low for our setup. Now, we get some serious numbers. Thanks, Markus On Tue, Feb 15, 2011 at 9:09 PM, Aaron Morton wro= te: > Initial thoughts are you are overloading the cluster, are their any log l= ines about dropping messages? > > What is the schema, what settings do you have in Cassandra yaml =C2=A0and= what are CF stats telling you? E.g. Are you switching Memtables too quickl= y? What are the write latency numbers? > > Also 0.7 is much faster. > > Aaron > > On 16/02/2011, at 8:59 AM, Thibaut Britz = wrote: > >> Cassandra is very CPU hungry so you might be hitting a CPU bottleneck. >> What's your CPU usage during these tests? >> >> >> On Tue, Feb 15, 2011 at 8:45 PM, Markus Klems wrote: >>> Hi there, >>> >>> we are currently benchmarking a Cassandra 0.6.5 cluster with 3 >>> High-Mem Quadruple Extra Large EC2 nodes >>> (http://aws.amazon.com/ec2/#instance) using Yahoo's YCSB tool >>> (replication factor is 3, random partitioner). We assigned 32 GB RAM >>> to the JVM and left 32 GB RAM for the Ubuntu Linux filesystem buffer. >>> We also set the user count to a very large number via ulimit -u >>> 999999. >>> >>> Our goal is to achieve max throughput by increasing YCSB's threadcount >>> parameter (i.e. the number of parallel benchmarking client threads). >>> However, this does only improve Cassandra throughput for low numbers >>> of threads. If we move to higher threadcounts, throughput does not >>> increase and even =C2=A0decreases. Do you have any idea why this is >>> happening and possibly suggestions how to scale throughput to much >>> higher numbers? Why is throughput hitting a wall, anyways? And where >>> does the latency/throughput tradeoff come from? >>> >>> Here is our YCSB configuration: >>> recordcount=3D300000 >>> operationcount=3D1000000 >>> workload=3Dcom.yahoo.ycsb.workloads.CoreWorkload >>> readallfields=3Dtrue >>> readproportion=3D0.5 >>> updateproportion=3D0.5 >>> scanproportion=3D0 >>> insertproportion=3D0 >>> threadcount=3D 500 >>> target =3D 10000 >>> hosts=3DEC2-1,EC2-2,EC2-3 >>> requestdistribution=3Duniform >>> >>> These are typical results for threadcount=3D1: >>> Loading workload... >>> Starting test. >>> =C2=A00 sec: 0 operations; >>> =C2=A010 sec: 11733 operations; 1168.28 current ops/sec; [UPDATE >>> AverageLatency(ms)=3D0.64] [READ AverageLatency(ms)=3D1.03] >>> =C2=A020 sec: 24246 operations; 1251.68 current ops/sec; [UPDATE >>> AverageLatency(ms)=3D0.48] [READ AverageLatency(ms)=3D1.11] >>> >>> These are typical results for threadcount=3D10: >>> 10 sec: 30428 operations; 3029.77 current ops/sec; [UPDATE >>> AverageLatency(ms)=3D2.11] [READ AverageLatency(ms)=3D4.32] >>> =C2=A020 sec: 60838 operations; 3041.91 current ops/sec; [UPDATE >>> AverageLatency(ms)=3D2.15] [READ AverageLatency(ms)=3D4.37] >>> >>> These are typical results for threadcount=3D100: >>> 10 sec: 29070 operations; 2895.42 current ops/sec; [UPDATE >>> AverageLatency(ms)=3D20.53] [READ AverageLatency(ms)=3D44.91] >>> =C2=A020 sec: 53621 operations; 2455.84 current ops/sec; [UPDATE >>> AverageLatency(ms)=3D23.11] [READ AverageLatency(ms)=3D55.39] >>> >>> These are typical results for threadcount=3D500: >>> 10 sec: 30655 operations; 3053.59 current ops/sec; [UPDATE >>> AverageLatency(ms)=3D72.71] [READ AverageLatency(ms)=3D187.19] >>> =C2=A020 sec: 68846 operations; 3814.14 current ops/sec; [UPDATE >>> AverageLatency(ms)=3D65.36] [READ AverageLatency(ms)=3D191.75] >>> >>> We never measured more than ~6000 ops/sec. Are there ways to tune >>> Cassandra that we are not aware of? We made some modification to the >>> Cassandra 0.6.5 core for experimental reasons, so it's not easy to >>> switch to 0.7x or 0.8x. However, if this might solve the scaling >>> issues, we might consider to port our modifications to a newer >>> Cassandra version... >>> >>> Thanks, >>> >>> Markus Klems >>> >>> Karlsruhe Institute of Technology, Germany >>> >