Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F0613200BBD for ; Tue, 8 Nov 2016 20:59:24 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EEE46160B0A; Tue, 8 Nov 2016 19:59:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9274E160AD0 for ; Tue, 8 Nov 2016 20:59:23 +0100 (CET) Received: (qmail 85547 invoked by uid 500); 8 Nov 2016 19:59:18 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 84691 invoked by uid 99); 8 Nov 2016 19:59:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Nov 2016 19:59:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9CD6A183994; Tue, 8 Nov 2016 19:59:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.93 X-Spam-Level: * X-Spam-Status: No, score=1.93 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id WP7NBdLaBzLg; Tue, 8 Nov 2016 19:59:11 +0000 (UTC) Received: from mail-it0-f46.google.com (mail-it0-f46.google.com [209.85.214.46]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 358795F644; Tue, 8 Nov 2016 19:59:11 +0000 (UTC) Received: by mail-it0-f46.google.com with SMTP id e187so34358314itc.0; Tue, 08 Nov 2016 11:59:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=BvAVWwp1EtN7NaS6c3oyyd8rAQef55nl2vivuOl6Q0o=; b=HkhlZvUVtTsyFRh+vyohIj8iBaUcTx6M5PyItORyRskpmXMO/Sv2wVoSTC60fyZK1B aPPkFvxtEflYmQceCPuxkQ5NSzspGLwKmvh90mXJsS0oquRbP5jSJGDp7WcU7EEzoihH j4XFsennco94Dj0Xg8cB29/CzxhqrVNdgNC5n7MUKhH4RQ/dN0zGU7jK4scHXBiBKXYk FBhxunWuNJTPOJg4Ij62uaERKke/hwZsgN7n500J/nQHU36DCSAHiH4z9OdDf77dKSit HF4iVGCFJSuaEEfJ0d6VA1olTgDsSgLxYWptgTtasgeXXO9h0WGuccDWdn/dHOvyjXK4 xKwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=BvAVWwp1EtN7NaS6c3oyyd8rAQef55nl2vivuOl6Q0o=; b=Fqbhw1+jNPD/8dcPkbNXK5kTdIZEToHvL2yDFG2Dt47m1ldtfShEwdG/OPDjQH4oMZ EP8ggXVPs1MMLeBGbhX392Urp/IBUtGz/AXoBCh+bD0bcBMR13vG3XckYTnFotoaQduR Chilvmul95bY3NMQUYG1Cdv2dFbBNvPBmFvDLiYQYN0yE/Vs7msa/rXM3UPn53FxwwQm nN5GCVbFibwc7F8RTWBIZPfVbRaVc6aUBx2qj9bj0qQZqMaR+mQLgXUbLtqXMJqVyg3x SIX8bzsluez++NPt24rUCM1kUPvuPbkauNdCXk83KsK9ldUTPu1ZjecgxXbUXSMPY/ZH aFHQ== X-Gm-Message-State: ABUngvcUcijcrwD9GZdMSw3Ne230tB2QTQ7tVpRUswSoeIksZGm6jb9YUU0qOzEj25IVSXjnfZfkkRb58CnUkA== X-Received: by 10.36.15.137 with SMTP id 131mr9680441ito.115.1478635145348; Tue, 08 Nov 2016 11:59:05 -0800 (PST) MIME-Version: 1.0 Received: by 10.36.98.15 with HTTP; Tue, 8 Nov 2016 11:58:44 -0800 (PST) In-Reply-To: <0D8ABFDC-9725-4F0C-A27C-93DD339F484B@internalcircle.com> References: <80AC321A-5668-4F1B-BF0E-6C5F28724054@crowdstrike.com> <2A184671-D570-4A96-8441-91F3EDC664FD@crowdstrike.com> <0D8ABFDC-9725-4F0C-A27C-93DD339F484B@internalcircle.com> From: Dikang Gu Date: Tue, 8 Nov 2016 11:58:44 -0800 Message-ID: Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 To: dev@cassandra.apache.org Cc: cassandra Content-Type: multipart/alternative; boundary=001a1144a40c974ad80540cf9265 archived-at: Tue, 08 Nov 2016 19:59:25 -0000 --001a1144a40c974ad80540cf9265 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Michael, thanks for the info. It sounds to me a very serious performance regression. :( On Tue, Nov 8, 2016 at 11:39 AM, Michael Kjellman < mkjellman@internalcircle.com> wrote: > Yes, We hit this as well. We have a internal patch that I wrote to mostly > revert the behavior back to ByteBuffers with as small amount of code chan= ge > as possible. Performance of our build is now even with 2.0.x and we've al= so > forward ported it to 3.x (although the 3.x patch was even more complicate= d > due to Bounds, RangeTombstoneBound, ClusteringPrefix which actually > increases the number of allocations to somewhere between 11 and 13 > depending on how I count it per indexed block -- making it even worse tha= n > what you're observing in 2.1. > > We haven't upstreamed it as 2.1 is obviously not taking any changes at > this point and the longer term solution is https://issues.apache.org/ > jira/browse/CASSANDRA-9754 (which also includes the changes to go back to > ByteBuffers and remove as much of the Composites from the storage engine = as > possible.) Also, the solution is a bit of a hack -- although it was a > blocker from us deploying 2.1 -- so i'm not sure how "hacky" it is if it > works.. > > best, > kjellman > > > On Nov 8, 2016, at 11:31 AM, Dikang Gu ang85@gmail.com>> wrote: > > This is very expensive: > > "MessagingService-Incoming-/2401:db00:21:1029:face:0:9:0" prio=3D10 > tid=3D0x00007f2fd57e1800 nid=3D0x1cc510 runnable [0x00007f2b971b0000] > java.lang.Thread.State: RUNNABLE > at org.apache.cassandra.db.marshal.IntegerType.compare( > IntegerType.java:29) > at org.apache.cassandra.db.composites.AbstractSimpleCellNameType. > compare(AbstractSimpleCellNameType.java:98) > at org.apache.cassandra.db.composites.AbstractSimpleCellNameType. > compare(AbstractSimpleCellNameType.java:31) > at java.util.TreeMap.put(TreeMap.java:545) > at java.util.TreeSet.add(TreeSet.java:255) > at org.apache.cassandra.db.filter.NamesQueryFilter$ > Serializer.deserialize(NamesQueryFilter.java:254) > at org.apache.cassandra.db.filter.NamesQueryFilter$ > Serializer.deserialize(NamesQueryFilter.java:228) > at org.apache.cassandra.db.SliceByNamesReadCommandSeriali > zer.deserialize(SliceByNamesReadCommand.java:104) > at org.apache.cassandra.db.ReadCommandSerializer. > deserialize(ReadCommand.java:156) > at org.apache.cassandra.db.ReadCommandSerializer. > deserialize(ReadCommand.java:132) > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:99) > at org.apache.cassandra.net.IncomingTcpConnection.receiveMessage( > IncomingTcpConnection.java:195) > at org.apache.cassandra.net.IncomingTcpConnection.receiveMessages= ( > IncomingTcpConnection.java:172) > at org.apache.cassandra.net.IncomingTcpConnection.run( > IncomingTcpConnection.java:88) > > > Checked the git history, it comes from this jira: > https://issues.apache.org/jira/browse/CASSANDRA-5417 > > Any thoughts? > =E2=80=8B > > On Fri, Oct 28, 2016 at 10:32 AM, Paulo Motta mailto:pauloricardomg@gmail.com>> wrote: > Haven't seen this before, but perhaps it's related to CASSANDRA-10433? > This is just a wild guess as it's in a related codepath, but maybe worth > trying out the patch available to see if it helps anything... > > 2016-10-28 15:03 GMT-02:00 Dikang Gu ang85@gmail.com>>: > We are seeing huge cpu regression when upgrading one of our 2.0.16 cluste= r > to 2.1.14 as well. The 2.1.14 node is not able to handle the same amount = of > read traffic as the 2.0.16 node, actually, it's less than 50%. > > And in the perf results, the first line could go as high as 50%, as we > turn up the read traffic, which never appeared in 2.0.16. > > Any thoughts? > Thanks > > > Samples: 952K of event 'cycles', Event count (approx.): 229681774560 > Overhead Shared Object Symbol > 6.52% perf-196410.map [.] > Lorg/apache/cassandra/db/marshal/IntegerType;.compare in > Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare > 4.84% libzip.so [.] adler32 > 2.88% perf-196410.map [.] > Ljava/nio/HeapByteBuffer;.get in Lorg/apache/cassandra/db/ > marshal/IntegerType;.compare > 2.39% perf-196410.map [.] > Ljava/nio/Buffer;.checkIndex in Lorg/apache/cassandra/db/ > marshal/IntegerType;.findMostSignificantByte > 2.03% perf-196410.map [.] > Ljava/math/BigInteger;.compareTo in Lorg/apache/cassandra/db/ > DecoratedKey;.compareTo > 1.65% perf-196410.map [.] vtable chunks > 1.44% perf-196410.map [.] > Lorg/apache/cassandra/db/DecoratedKey;.compareTo in Ljava/util/concurrent= / > ConcurrentSkipListMap;.findNode > 1.02% perf-196410.map [.] > Lorg/apache/cassandra/db/composites/AbstractSimpleCellNameType;.compare > 1.00% snappy-1.0.5.2-libsnappyjava.so libsnappyjava.so/> [.] 0x0000000000003804 > 0.87% perf-196410.map [.] > Ljava/io/DataInputStream;.readFully in Lorg/apache/cassandra/db/ > AbstractCell$1;.computeNext > 0.82% snappy-1.0.5.2-libsnappyjava.so libsnappyjava.so/> [.] 0x00000000000036dc > 0.79% [kernel] [k] > copy_user_generic_string > 0.73% perf-196410.map [.] vtable chunks > 0.71% perf-196410.map [.] > Lorg/apache/cassandra/db/OnDiskAtom$Serializer;.deserializeFromSSTable in > Lorg/apache/cassandra/db/AbstractCell$1;.computeNext > 0.70% [kernel] [k] find_busiest_group > 0.69% perf-196410.map [.] <80>H3^? > 0.68% perf-196410.map [.] > Lorg/apache/cassandra/db/DecoratedKey;.compareTo > 0.65% perf-196410.map [.] > jbyte_disjoint_arraycopy > 0.64% [kernel] [k] _raw_spin_lock > 0.63% [kernel] [k] __schedule > 0.45% snappy-1.0.5.2-libsnappyjava.so libsnappyjava.so/> [.] 0x00000000000036df > > On Fri, Jan 29, 2016 at 2:11 PM, Corry Opdenakker > wrote: > @JC, Get the pid of your target java process (something like "ps -ef | > grep -i cassandra") . > Then do a kill -3 (at unix/linux) > Check the stdout logfile of the process. > it should contain the threaddump. > If you found it, then great! > Let that kill -3 loop for about 2 or 3 minutes. > Herafter copy paste and load the stdout file into one if the mentioned > tools. > If you are not familiar with the java internals, then those threaddumps > will learn you a lot:) > > > > > Op vrijdag 29 januari 2016 heeft Jean Carlo mailto:jean.jeancarl48@gmail.com>> het volgende geschreven: > I am having the same issue after upgrade cassandra 2.1.12 from 2.0.10. I > am not good on jvm so I would like to know how to do what @CorryOpdenakke= r > propose with cassandra. > > :) > > I check concurrent_compactors > > > Saludos > > Jean Carlo > > "The best way to predict the future is to invent it" Alan Kay > > On Fri, Jan 29, 2016 at 9:24 PM, Corry Opdenakker > wrote: > Hi guys, > Cassandra is still new for me, but I have a lot of java tuning experience= . > > For root cause detection of performance degradations its always good to > start with collecting a series of java thread dumps. Take at problem > occurrence using a loopscript for example 60 thread dumps with an interva= l > of 1 or 2 seconds. > Then load those dumps into IBM thread dump analyzer or in "eclipse mat" o= r > any similar tool and see which methods appear to be most active or blocki= ng > others. > > Its really very useful > > Same can be be done in a normal situation to compare the difference. > > That should give more insights. > > Cheers, Corry > > > Op vrijdag 29 januari 2016 heeft Peddi, Praveen het > volgende geschreven: > Hello, > We have another update on performance on 2.1.11. compression_chunk_size > didn=E2=80=99t really help much but We changed concurrent_compactors from= default > to 64 in 2.1.11 and read latencies improved significantly. However, 2.1.1= 1 > read latencies are still 1.5 slower than 2.0.9. One thing we noticed in J= MX > metric that could affect read latencies is that 2.1.11 is running > ReadRepairedBackground and ReadRepairedBlocking too frequently compared t= o > 2.0.9 even though our read_repair_chance is same on both. Could anyone sh= ed > some light on why 2.1.11 could be running read repair 10 to 50 times more > in spite of same configuration on both clusters? > > dclocal_read_repair_chance=3D0.100000 AND > read_repair_chance=3D0.000000 AND > > Here is the table for read repair metrics for both clusters. > 2.0.9 2.1.11 > ReadRepairedBackground 5MinAvg 0.006 0.1 > 15MinAvg 0.009 0.153 > ReadRepairedBlocking 5MinAvg 0.002 0.55 > 15MinAvg 0.007 0.91 > > Thanks > Praveen > > From: Jeff Jirsa > Reply-To: > Date: Thursday, January 14, 2016 at 2:58 PM > To: "user@cassandra.apache.org" > Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 > > Sorry I wasn=E2=80=99t as explicit as I should have been > > The same buffer size is used by compressed reads as well, but tuned with > compression_chunk_size table property. It=E2=80=99s likely true that if y= ou lower > compression_chunk_size, you=E2=80=99ll see improved read performance. > > This was covered in the AWS re:Invent youtube link I sent in my original > reply. > > > > From: "Peddi, Praveen" > Reply-To: "user@cassandra.apache.org" > Date: Thursday, January 14, 2016 at 11:36 AM > To: "user@cassandra.apache.org", Zhiyan Shao > Cc: "Agrawal, Pratik" > Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 > > Hi, > We will try with reduced =E2=80=9Crar_buffer_size=E2=80=9D to 4KB. Howeve= r CASSANDRA-10249< > https://issues.apache.org/jira/browse/CASSANDRA-10249> says "this only > affects users who have 1. disabled compression, 2. switched to buffered i= /o > from mmap=E2=80=99d=E2=80=9D. None of this is true for us I believe. We u= se default > disk_access_mode which should be mmap. We also used LZ4Compressor when > created table. > > We will let you know if this property had any effect. We were testing wit= h > 2.1.11 and this was only fixed in 2.1.12 so we need to play with latest > version. > > Praveen > > > > > > From: Jeff Jirsa > Reply-To: > Date: Thursday, January 14, 2016 at 1:29 PM > To: Zhiyan Shao , "user@cassandra.apache.org" < > user@cassandra.apache.org> > Cc: "Agrawal, Pratik" > Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 > > This may be due to https://issues.apache.org/jira/browse/CASSANDRA-10249 > / https://issues.apache.org/jira/browse/CASSANDRA-8894 - whether or not > this is really the case depends on how much of your data is in page cache= , > and whether or not you=E2=80=99re using mmap. Since the original question= was asked > by someone using small RAM instances, it=E2=80=99s possible. > > We mitigate this by dropping compression_chunk_size in order to force a > smaller buffer on reads, so we don=E2=80=99t over read very small blocks.= This has > other side effects (lower compression ratio, more garbage during > streaming), but significantly speeds up read workloads for us. > > > From: Zhiyan Shao > Date: Thursday, January 14, 2016 at 9:49 AM > To: "user@cassandra.apache.org" > Cc: Jeff Jirsa, "Agrawal, Pratik" > Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 > > Praveen, if you search "Read is slower in 2.1.6 than 2.0.14" in this > forum, you can find another thread I sent a while ago. The perf test I di= d > indicated that read is slower for 2.1.6 than 2.0.14 so we stayed with > 2.0.14. > > On Tue, Jan 12, 2016 at 9:35 AM, Peddi, Praveen wrote: > Thanks Jeff for your reply. Sorry for delayed response. We were running > some more tests and wanted to wait for the results. > > So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 > (see below) for the same exact load test. Memory spikes were also > aggressive on 2.1.11. > > So we wanted to rule out any of our custom setting so we ended up doing > some testing with Cassandra stress test and default Cassandra installatio= n. > Here are the results we saw between 2.0.9 and 2.1.11. Both are default > installations and both use Cassandra stress test with same params. This i= s > the closest apple-apple comparison we can get. As you can see both read a= nd > write latencies are 30 to 50% worse in 2.1.11 than 2.0.9. Since we are > using default installation. > > Highlights of the test: > Load: 2x reads and 1x writes > CPU: 2.0.9 (goes upto 25%) compared to 2.1.11 (goes upto 60%) > Local read latency: 0.039 ms for 2.0.9 and 0.066 ms for 2.1.11 > Local write Latency: 0.033 ms for 2.0.9 Vs 0.030 ms for 2.1.11 > One observation is, As the number of threads are increased, 2.1.11 read > latencies are getting worse compared to 2.0.9 (see below table for 24 > threads vs 54 threads) > Not sure if anyone has done this kind of comparison before and what their > thoughts are. I am thinking for this same reason > > 2.0.9 Plain type total ops op/s pk/s > row/s mean med 0.95 0.99 0.999 > max time > 16 threadCount READ 66854 7205 7205 7205 1.6 1.3 > 2.8 3.5 9.6 85.3 9.3 > 16 threadCount WRITE 33146 3572 3572 3572 1.3 1 > 2.6 3.3 7 206.5 9.3 > 16 threadCount total 100000 10777 10777 10777 1.5 1.3 > 2.7 3.4 7.9 206.5 9.3 > 2.1.11 Plain > 16 threadCount READ 67096 6818 6818 6818 1.6 1.5 > 2.6 3.5 7.9 61.7 9.8 > 16 threadCount WRITE 32904 3344 3344 3344 1.4 1.3 > 2.3 3 6.5 56.7 9.8 > 16 threadCount total 100000 10162 10162 10162 1.6 1.4 > 2.5 3.2 6 61.7 9.8 > 2.0.9 Plain > 24 threadCount READ 66414 8167 8167 8167 2 1.6 > 3.7 7.5 16.7 208 8.1 > 24 threadCount WRITE 33586 4130 4130 4130 1.7 1.3 > 3.4 5.4 25.6 45.4 8.1 > 24 threadCount total 100000 12297 12297 12297 1.9 1.5 > 3.5 6.2 15.2 208 8.1 > 2.1.11 Plain > 24 threadCount READ 66628 7433 7433 7433 2.2 2.1 > 3.4 4.3 8.4 38.3 9 > 24 threadCount WRITE 33372 3723 3723 3723 2 1.9 > 3.1 3.8 21.9 37.2 9 > 24 threadCount total 100000 11155 11155 11155 2.1 2 > 3.3 4.1 8.8 38.3 9 > 2.0.9 Plain > 54 threadCount READ 67115 13419 13419 13419 2.8 2.6 > 4.2 6.4 36.9 82.4 5 > 54 threadCount WRITE 32885 6575 6575 6575 2.5 2.3 > 3.9 5.6 15.9 81.5 5 > 54 threadCount total 100000 19993 19993 19993 2.7 2.5 > 4.1 5.7 13.9 82.4 5 > 2.1.11 Plain > 54 threadCount READ 66780 8951 8951 8951 4.3 3.9 > 6.8 9.7 49.4 69.9 7.5 > 54 threadCount WRITE 33220 4453 4453 4453 3.5 3.2 > 5.7 8.2 36.8 68 7.5 > 54 threadCount total 100000 13404 13404 13404 4 3.7 > 6.6 9.2 48 69.9 7.5 > > > From: Jeff Jirsa > Date: Thursday, January 7, 2016 at 1:01 AM > To: "user@cassandra.apache.org" , Peddi > Praveen > Subject: Re: Slow performance after upgrading from 2.0.9 to 2.1.11 > > Anecdotal evidence typically agrees that 2.1 is faster than 2.0 (our > experience was anywhere from 20-60%, depending on workload). > > However, it=E2=80=99s not necessarily true that everything behaves exactl= y the > same =E2=80=93 in particular, memtables are different, commitlog segment = handling > is different, and GC params may need to be tuned differently for 2.1 than > 2.0. > > When the system is busy, what=E2=80=99s it actually DOING? Cassandra expo= ses a TON > of metrics =E2=80=93 have you plugged any into a reporting system to see = what=E2=80=99s > going on? Is your latency due to pegged cpu, iowait/disk queues or gc > pauses? > > My colleagues spent a lot of time validating different AWS EBS configs > (video from reinvent at https://www.youtube.com/watch?v=3D1R-mgOcOSd4), 2= .1 > was faster in almost every case, but you=E2=80=99re using an instance siz= e I don=E2=80=99t > believe we tried (too little RAM to be viable in production). c3.2xl onl= y > gives you 15G of ram =E2=80=93 most =E2=80=9Cperformance=E2=80=9D based s= ystems want 2-4x that > (people running G1 heaps usually start at 16G heaps and leave another > 16-30G for page cache), you=E2=80=99re running fairly small hardware =E2= =80=93 it=E2=80=99s > possible that 2.1 isn=E2=80=99t =E2=80=9Cas good=E2=80=9D on smaller hard= ware. > > (I do see your domain, presumably you know all of this, but just to be > sure): > > You=E2=80=99re using c3, so presumably you=E2=80=99re using EBS =E2=80=93= are you using GP2? Which > volume sizes? Are they the same between versions? Are you hitting your io= ps > limits? Running out of burst tokens? Do you have enhanced networking > enabled? At load, what part of your system is stressed? Are you cpu bound= ? > Are you seeing GC pauses hurt latency? Have you tried changing > memtable_allocation_type -> offheap objects (available in 2.1, not in 2.= 0)? > > Tuning gc_grace is weird =E2=80=93 do you understand what it does? Are yo= u > overwriting or deleting a lot of data in your test (that=E2=80=99d be unu= sual)? Are > you doing a lot of compaction? > > > From: "Peddi, Praveen" > Reply-To: "user@cassandra.apache.org" > Date: Wednesday, January 6, 2016 at 11:41 AM > To: "user@cassandra.apache.org" > Subject: Slow performance after upgrading from 2.0.9 to 2.1.11 > > Hi, > We have upgraded Cassandra from 2.0.9 to 2.1.11 in our loadtest > environment with pretty much same yaml settings in both (removed unused > yaml settings and renamed few others) and we have noticed performance on > 2.1.11 is worse compared to 2.0.9. After more investigation we found that > the performance gets worse as we increase replication factor on 2.1.11 > where as on 2.0.9 performance is more or less same. Has anything > architecturally changed as far as replication is concerned in 2.1.11? > > All googling only suggested 2.1.11 should be FASTER than 2.0.9 so we are > obviously doing something different. However the client code, load test i= s > all identical in both cases. > > Details: > Nodes: 3 ec2 c3.2x large > R/W Consistency: QUORUM > Renamed memtable_total_space_in_mb to memtable_heap_space_in_mb and > removed unused properties from yaml file. > We run compaction aggressive compaction with low gc_grace (15 mins) but > this is true for both 2.0.9 and 2.1.11. > > As you can see, all p50, p90 and p99 latencies stayed with in 10% > difference on 2.0.9 when we increased RF from 1 to 3, where as on 2.1.11 > latencies almost doubled (especially reads are much slower than writes). > > # Nodes RF # of rows 2.0.9 2.1.11 > READ > P50 P90 P99 P50 P90 P99 > 3 1 450 306 594 747 425 849 1085 > 3 3 450 358 634 877 708 1274 2642 > > WRITE > 3 1 10 26 80 179 37 131 196 > 3 3 10 31 96 184 46 166 468 > > Any pointers on how to debug performance issues will be appreciated. > > Praveen > > > > -- > ---------------------------------- > Bestdata.be > Optimised ict > Tel:+32(0)496609576 > Corry@BestData.be > ---------------------------------- > > > > > -- > ---------------------------------- > Bestdata.be > Optimised ict > Tel:+32(0)496609576 > Corry@BestData.be > ---------------------------------- > > > > > -- > Dikang > > > > > > -- > Dikang > > > --=20 Dikang --001a1144a40c974ad80540cf9265--