Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0115895FA for ; Tue, 31 Jul 2012 14:05:15 +0000 (UTC) Received: (qmail 11618 invoked by uid 500); 31 Jul 2012 14:05:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 11590 invoked by uid 500); 31 Jul 2012 14:05:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 11581 invoked by uid 99); 31 Jul 2012 14:05:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2012 14:05:12 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Viktor.Jevdokimov@adform.com designates 86.58.139.11 as permitted sender) Received: from [86.58.139.11] (HELO mail1.adform.com) (86.58.139.11) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Jul 2012 14:05:08 +0000 Received: from dkexc002.adform.com ([10.0.19.5]) by mail1.adform.com with XWall v3.47 ; Tue, 31 Jul 2012 17:04:44 +0300 Received: from DKEXC001.adform.com (10.0.8.3) by DKEXC002.adform.com (10.0.19.5) with Microsoft SMTP Server (TLS) id 14.2.283.3; Tue, 31 Jul 2012 17:04:44 +0300 Received: from DKEXC002.adform.com ([fe80::a869:727c:285c:fd3]) by DKEXC001.adform.com ([fe80::6d0c:ccd8:b690:688a%18]) with mapi id 14.02.0247.003; Tue, 31 Jul 2012 17:04:44 +0300 From: Viktor Jevdokimov To: "user@cassandra.apache.org" Subject: RE: Read latency skyrockets after 95-98th percentile Thread-Topic: Read latency skyrockets after 95-98th percentile Thread-Index: AQHNbxtjYaCtybASLkeUuzLKWUAX2pdDav0A Date: Tue, 31 Jul 2012 14:04:43 +0000 Message-ID: <2C85E14562B39345BCCAD90B8E7955C91E1C29@DKEXC002.adform.com> References: <6063A104-DF1E-4C0D-92C0-2410EDCE969D@dentsunetwork.com> In-Reply-To: <6063A104-DF1E-4C0D-92C0-2410EDCE969D@dentsunetwork.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [192.168.0.95] Content-Type: multipart/related; boundary="_004_2C85E14562B39345BCCAD90B8E7955C91E1C29DKEXC002adformcom_"; type="multipart/alternative" MIME-Version: 1.0 X-XWALL-BCKS: auto X-Virus-Checked: Checked by ClamAV on apache.org --_004_2C85E14562B39345BCCAD90B8E7955C91E1C29DKEXC002adformcom_ Content-Type: multipart/alternative; boundary="_000_2C85E14562B39345BCCAD90B8E7955C91E1C29DKEXC002adformcom_" --_000_2C85E14562B39345BCCAD90B8E7955C91E1C29DKEXC002adformcom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable What is a data load? Does it fits in RAM? I bet it's due to GC. Since this is a 1 node only, dynamic snitch is out of= scope. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: Viktor.Jevdokimov@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider What is Adform: watch this short video [Adform News] Disclaimer: The information contained in this message and attachments is in= tended solely for the attention and use of the named addressee and may be c= onfidential. If you are not the intended recipient, you are reminded that t= he information remains the property of the sender. You must not use, disclo= se, distribute, copy, print or rely on this e-mail. If you have received th= is message in error, please contact the sender immediately and irrevocably = delete this message and any copies. From: Andras Szerdahelyi [mailto:andras.szerdahelyi@ignitionone.com] Sent: Tuesday, July 31, 2012 15:53 To: user@cassandra.apache.org Subject: Read latency skyrockets after 95-98th percentile hey list, i've been trying to understand why we are seeing rather nasty read latency = peaks in as much as 2% of our total read ( not sure what the underlying thr= ift call is, should be get_slice ) requests that we have been **unable to t= ie to garbage collection or blocking I/O.** This is what i mean by "nasty" : Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.0242499999999= 999, 95th: 2.1103499999999977, 98th: 5.2102600000000034, 99th: 13.248350000= 00001, 999th: 41.89454000000041 This is measured around the Astyanax / Hector wrappers for running column s= lice queries and iterating over their results. FWIW they perform very simil= arly. Now, the mean is awesome - 1.2ms for a read is really really fast, i think = thrift overhead makes up for probably 1ms alone. What i can't live with is = pretty much everything thats beyond the 98th percentile. Perhaps my expecta= tions are misaligned to begin with and this is great, but the dataset under= test that yielded this result ( see below ) is extremely small - only a 10= 00 items per test made up of a few small dynamic composite columns and the= read is being done sequentially, from a single threaded client. I just don= 't understand whats causing the peaks. Here's the complete run-down on the setup: The data is dynamic composite columns with string or integer values. The Dy= namicComposites themselves are made up of 3 Components each: and The query is getting all columns ( in this particular test, that will be a = single column ) of one of the composite types ( = or ) for a particular string key. ( How this is achieved is more or less at https://gist.github.com/3207894 ) The production cluster is set up as follows: 30-40 read/sec clients from multiple hosts, NetworkTopologyStrategy, 2 repl= icas per DC, read consistency: ONE, 2DCs, 2 nodes per DC ) 8GB multi-core physical hosts most settings on default in cassandra.yaml In an attempt to isolate the problem, i've done the following ( this is whe= re the results above are coming from but we see roughly the same numbers in= prod ) - created a short unit test around the ( two ) classes that wrap the Astyan= ax and Hector libraries in our application, in which i'm only testing the r= ead performance - in setUp() i start Cassandra with CassandraStoreKeeperTest.cassandra =3D new CassandraDaemon(); CassandraStoreKeeperTest.cassandra.init(null); CassandraStoreKeeperTest.cassandra.start(); - same cassandra yaml as in prod, file at https://gist.github.com/3207894 (= mostly defaults ) - i pass the following flags to the JVM running my unit test: -server -Xmx= 2G and all the GC flags from cassandra-env.sh, plus the jamm java agent - i warm up the embedded cassandra with 1k reads and 1k writes - the test is 1000 single threaded, sequential reads via Hector and Astyana= x ( no significant differences between the two ) via thrift to the embedded= single-node Cassandra instance - i read back the 1000 items saved in the warm-up The results are Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.0242499999999= 999, 95th: 2.1103499999999977, 98th: 5.2102600000000034, 99th: 13.248350000= 00001, 999th: 41.89454000000041 here's the GC output during the read [GC [ParNew Desired survivor size 1081344 bytes, new threshold 1 (max 1) - age 1: 281480 bytes, 281480 total : 17322K->308K(19136K), 0.0037375 secs] 28742K->11885K(83008K), 0.0038170 s= ecs] [Times: user=3D0.01 sys=3D0.00, real=3D0.01 secs] [GC [ParNew Desired survivor size 1081344 bytes, new threshold 1 (max 1) - age 1: 261360 bytes, 261360 total : 17332K->386K(19136K), 0.0034069 secs] 28909K->12098K(83008K), 0.0034849 s= ecs] [Times: user=3D0.00 sys=3D0.01, real=3D0.00 secs] [GC [ParNew Desired survivor size 1081344 bytes, new threshold 1 (max 1) - age 1: 241336 bytes, 241336 total : 17410K->383K(19136K), 0.0035252 secs] 29122K->12209K(83008K), 0.0036062 s= ecs] [Times: user=3D0.00 sys=3D0.00, real=3D0.00 secs] Doesn't look like GC ( i could be wrong ) - what else could it be ? all suggestions are highly appreciated.. Thanks! Andras ps: this is with Cassandra 1.1.0 --_000_2C85E14562B39345BCCAD90B8E7955C91E1C29DKEXC002adformcom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

What is a data load? Does= it fits in RAM?

 <= /p>

I bet it’s due to G= C. Since this is a 1 node only, dynamic snitch is out of scope.<= /span>

 <= /p>

 <= /p>

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider

Disclaimer: The information contained in this message and attachments is in= tended solely for the attention and use of the named addressee and may be c= onfidential. If you are not the intended recipient, you are reminded that t= he information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely= on this e-mail. If you have received this message in error, please contact= the sender immediately and irrevocably delete this message and any copies.

From: Andras S= zerdahelyi [mailto:andras.szerdahelyi@ignitionone.com]
Sent: Tuesday, July 31, 2012 15:53
To: user@cassandra.apache.org
Subject: Read latency skyrockets after 95-98th percentile=

 

hey list,

 

i've been trying to understand why we are seeing rat= her nasty read latency peaks in as much as 2% of our total read ( not = sure what the underlying thrift call is, should be get_slice ) request= s that we have been **unable to tie to garbage collection or blocking I/O.** 

This is what i mean by "nasty" : 

Mean: 1.186341, Max: 41.912 stDev: 2.249050273749533= 3, 75th:1.0242499999999999, 95th: 2.1103499999999977, 98th: 5.2102600000000= 034, 99th: 13.24835000000001, 999th: 41.89454000000041

 

This is measured around the Astyanax / Hector wrappe= rs for running column slice queries and iterating over their results. FWIW = they perform very similarly.

 

Now, the mean is awesome - 1.2ms for a read is reall= y really fast, i think thrift overhead makes up for probably 1ms alone. Wha= t i can't live with is pretty much everything thats beyond the 98th percent= ile. Perhaps my expectations are misaligned to begin with and this is great, but the dataset under test that yielded t= his result ( see below ) is extremely small - only a 1000 items per test ma= de up of a few small dynamic composite columns  and the read is being = done sequentially, from a single threaded client. I just don't understand whats causing the peaks.

 

Here's the complete run-down on the setup:

The data is dynamic composite columns with string or= integer values. The DynamicComposites themselves are made up of 3 Componen= ts each: <String,String,Integer> and <String,Integer,Integer>

 

The query is getting all columns ( in this particula= r test, that will be a single column ) of  one of the composite types = ( <String, String, Integer> or <String, Integer, Integer> ) for= a particular string key.

( How this is achieved is more or less at https://gist.github.com/3207894 = )

 

 

The production cluster is set up as follows:

30-40 read/sec clients from multiple hosts, NetworkT= opologyStrategy, 2 replicas per DC, read consistency: ONE, 2DCs, 2 nodes pe= r DC )

8GB multi-core physical hosts

most settings on default in cassandra.yaml

 

 

In an attempt to isolate the problem, i've done the = following ( this is where the results above are coming from but we see roug= hly the same numbers in prod )

 

- created a short unit test around the ( two ) class= es that wrap the Astyanax and Hector libraries in our application, in which= i'm only testing the read performance

- in setUp() i start Cassandra with =

 

CassandraStoreKeeperTest.cassandra =3D new CassandraDaemon();

CassandraStoreKeeperTest.cassandra.init(null);<= o:p>

CassandraStoreKeeperTest.cassandra.start();

 

- same cassandra yaml as in prod, fil= e at https://gist.github.c= om/3207894 ( mostly defaults )

- i = pass the following flags to the JVM running my unit test:  -server -Xmx2G = and all the GC flags from cassandra-env.sh, plus the jamm java agent=

- i warm up the embedded cassandra with 1k reads and= 1k writes

- the test is 1000 single threaded, sequential reads= via Hector and Astyanax ( no significant differences between the two ) via= thrift to the embedded single-node Cassandra instance

- i read back the 1000 items saved in the warm-up

 

The results are

Mean: 1.186341, Max: 41.912 stDev: 2.249050273749533= 3, 75th:1.0242499999999999, 95th: 2.1103499999999977, 98th: 5.2102600000000= 034, 99th: 13.24835000000001, 999th: 41.89454000000041

 

here's the GC output during the read

 

[GC [ParNew

Desired survivor size 1081344 bytes, new threshold 1= (max 1)

- age   1:     281480 bytes,   &= nbsp; 281480 total

: 17322K->308K(19136K), 0.0037375 secs] 28742K-&g= t;11885K(83008K), 0.0038170 secs] [Times: user=3D0.01 sys=3D0.00, real=3D0.= 01 secs] 

[GC [ParNew

Desired survivor size 1081344 bytes, new threshold 1= (max 1)

- age   1:     261360 bytes,   &= nbsp; 261360 total

: 17332K->386K(19136K), 0.0034069 secs] 28909K-&g= t;12098K(83008K), 0.0034849 secs] [Times: user=3D0.00 sys=3D0.01, real=3D0.= 00 secs] 

[GC [ParNew

Desired survivor size 1081344 bytes, new threshold 1= (max 1)

- age   1:     241336 bytes,   &= nbsp; 241336 total

: 17410K->383K(19136K), 0.0035252 secs] 29122K-&g= t;12209K(83008K), 0.0036062 secs] [Times: user=3D0.00 sys=3D0.00, real=3D0.= 00 secs] 

 

Doesn't look like GC ( i could be wrong ) - what els= e could it be ?

 

all suggestions are highly appreciated..<= /p>

Thanks!

Andras

 

 

ps: this is with Cassandra 1.1.0

 

 

 

 

--_000_2C85E14562B39345BCCAD90B8E7955C91E1C29DKEXC002adformcom_-- --_004_2C85E14562B39345BCCAD90B8E7955C91E1C29DKEXC002adformcom_ Content-Type: image/png; name="signature-logo29.png" Content-Description: signature-logo29.png Content-Disposition: inline; filename="signature-logo29.png"; size=2786; creation-date="Tue, 31 Jul 2012 14:04:44 GMT"; modification-date="Tue, 31 Jul 2012 14:04:44 GMT" Content-ID: Content-Transfer-Encoding: base64 iVBORw0KGgoAAAANSUhEUgAAAGgAAAAcCAYAAACNr/p2AAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJ bWFnZVJlYWR5ccllPAAAAyJpVFh0WE1MOmNvbS5hZG9iZS54bXAAAAAAADw/eHBhY2tldCBiZWdp bj0i77u/IiBpZD0iVzVNME1wQ2VoaUh6cmVTek5UY3prYzlkIj8+IDx4OnhtcG1ldGEgeG1sbnM6 eD0iYWRvYmU6bnM6bWV0YS8iIHg6eG1wdGs9IkFkb2JlIFhNUCBDb3JlIDUuMC1jMDYwIDYxLjEz NDc3NywgMjAxMC8wMi8xMi0xNzozMjowMCAgICAgICAgIj4gPHJkZjpSREYgeG1sbnM6cmRmPSJo dHRwOi8vd3d3LnczLm9yZy8xOTk5LzAyLzIyLXJkZi1zeW50YXgtbnMjIj4gPHJkZjpEZXNjcmlw dGlvbiByZGY6YWJvdXQ9IiIgeG1sbnM6eG1wPSJodHRwOi8vbnMuYWRvYmUuY29tL3hhcC8xLjAv IiB4bWxuczp4bXBNTT0iaHR0cDovL25zLmFkb2JlLmNvbS94YXAvMS4wL21tLyIgeG1sbnM6c3RS ZWY9Imh0dHA6Ly9ucy5hZG9iZS5jb20veGFwLzEuMC9zVHlwZS9SZXNvdXJjZVJlZiMiIHhtcDpD cmVhdG9yVG9vbD0iQWRvYmUgUGhvdG9zaG9wIENTNSBNYWNpbnRvc2giIHhtcE1NOkluc3RhbmNl SUQ9InhtcC5paWQ6Qzk5QzVCQzdEQjAyMTFFMEE0NDFCODQ4MzMzMkI3NzciIHhtcE1NOkRvY3Vt ZW50SUQ9InhtcC5kaWQ6Qzk5QzVCQzhEQjAyMTFFMEE0NDFCODQ4MzMzMkI3NzciPiA8eG1wTU06 RGVyaXZlZEZyb20gc3RSZWY6aW5zdGFuY2VJRD0ieG1wLmlpZDpDOTlDNUJDNURCMDIxMUUwQTQ0 MUI4NDgzMzMyQjc3NyIgc3RSZWY6ZG9jdW1lbnRJRD0ieG1wLmRpZDpDOTlDNUJDNkRCMDIxMUUw QTQ0MUI4NDgzMzMyQjc3NyIvPiA8L3JkZjpEZXNjcmlwdGlvbj4gPC9yZGY6UkRGPiA8L3g6eG1w bWV0YT4gPD94cGFja2V0IGVuZD0iciI/PqhJYX4AAAdWSURBVHja7FpNbFRVFD73zUwLpWlRKOFH qAtNGA02WkNiQssCjSyURDYu5CdRAi6EQGBDEWKCNexA0RjQTQth4WKamqhdSAxtYgK2IvGnDZDI VEFgan9hhrYz73q/O3Ond27fm3nTQnmGnuS1782c+96953vnnO+cO4xzTjMh8XicR6NRqq6uprKy MpZPNxaL8a6uLoyR1xhTW1vL6BGU4Ew9qLGxkQBQVVUVzrkTSAAxEolQW1vbpPFiHG9oaMD/Rwoo ayYeIoCR3pPxDlLnphw9etQRHDWupaVl1oMeUHgrqNPe3s67u7tNr5HAKBFe5/mZPBnlfOwS4dCF ldSkj2A1mwWoCOnp6ckB4sCBA8g9DGGvs7OT+vr6aP369QVBsUeOE4+34jwTI+anASmtT+sAtLvN xO0hbs19nayKXWwWIA+iewpIAcDJgMXq6+sLAzN0mOw7zROxu3wLWfM2E5uz1hUAARZPDeyTQOXT mwVoGpIYaebBwX1E9mA6hM1ZS4EFX3oKYcKzWEB4lx3/mnNxWGUbmO8BAsVFLkBIgSxcuJDC4bBn 9iTGZnMJQpWgx3nptHqOnqdwjvvouk70PB7bxkNxzWsqD1Jg/kGpMzzO+YWYTd2DNl3os7M6FSGi lZUWvbw0QOH5ltQFMPAmfu8cd/Mkc12wifLyqZYKXnSYqoOgfPLkSTITtRKEmU2bNrnWMOJB/PTp 0zmhSgkWo98X+UVRb68Coxw7diz7/AEBTrkGDrxGhDX53fU45wBGAEDLylh2vhdiKd4STVEkmpTX q6sCtDMcxH+Wj4HCLk7MEwbdvn17DlDImSdOnIA9nEoFaUPYA7YSxMhVRwElAcIkYKxCbAsTgnFN kMTDuBs9dhLcA4Ch5ilGME4sjkVv7+VLE8cdPceTlw/afH/XmPQuSENNCW19KsicwPFiF5QHiDAA R9V7+cRkp04C4IVTMFkHwVDmJHATHMaEJxkVnuMEDsDMJwh9xdDmDHGg3uFWviA+AY7MOUWAIz1a hLbm+lJ4l7z+6NIYNV1NTmqp4C037eK0LuUt0DfBcdI3wTHtDIHXIqoFndAFenhTnVwcbgkX1Beh C74TdJgpd3dzZYQF3FfrNGRjPEKA8BZHow/8uZcv0crrQOX7U0q+FSHGjrxYwje3j2ZBEnmJq5CI dethGS/Ujh07ZPTQ1wUAEP5hTH2devhzswNsvXv37myINCMRCnO51I0bN8oJ4EFwUQWOMiS+1xM4 Jq8Wob8N0FPgKIosJsnyEYVi5Nf+Jv4E9eZ4z3ToMXIP8pSST/8Yn+QVShQ4+rqEQZmwF8O1mbv1 3KT08eK56WRe7hwd3NNSIOzZs0fexIkEmKFIub05KbdCslCB6VUu9x2mCs17UL9MV9Yt+Tt7rsiD Cuc6ySnU4FVsVH+xnciScZ1XBy+/VagDDXrZ0dHhqYVTaBHTkVsJQYPHo5PaNtOVJXN7c67B9Ly2 px5KHYSkD/eGdxRiGjMptxO/0CJjtvej+h9NDf0/enHIJ6gz/ASKLkNjUbpn3//7Ru+cE39f9S1A lvIa1Bhu4DjRwIfSr0sZvTRR+U8P9Gu8ZygX9WXzLH95kKp8TUDq6upkwkIiQx4qpup/UPKXKFVG OaNSlm12SiY3VTl7fS/9k3hzAhyRQvXOgy8AQitfT4ig2mBzfnX5q+Ocni1JTw9bC1bFrilT9p// vUi9d09lP0N/znchTqeHmSLVl+CsKE97yg9xeFHGg5JRsoc/4VMB55ved+hi//6cz7c+HfJnDiok bpTT5PXYFXXsfRn1UqE2kAtArLLkSQnOVyN2FqTU0IdF5aKfYh9LcG4m6ujqyFvZz98Lh3wX3iRA ehEKkhCJRLjRN+Jm/00VWPivj0c7w9wmAGj6eC9Fn5usenxLmnKn0iANI7/bg5S89QqlBg9zce4K FPZ8zlyu5cg7/aOr6OzNM3pvjnY+E/Lnhh3aMHo/DcYEq8Nbrn7gYeYos0ugAIAeyIQYK8djrNk8 1NtGxcqaxYfY2RspXmo1SpC+EAwM+eiFOYwWYUdVHFbZBk6hGq1+ukRXBJX+LTEgAQU43934lsbs yiw4aJz6tg5Cm1zkHa43Lp0Mq0KT3ijNGJzpm1n5xgNMp/ZGMbJt5Qe07/xyqnnsXXn9+xiXB2Q5 dgxGWsVZa8bTJvKVJBgipJ3vO5IFB/tBn71UIhunvq6DsO+Axl2+9n+mw+wYntB5ztdvw30BLJqB bjp6XsrXXEWeOPT823Qt3kZ3kism0XD9UOAg38BrOm5/LsERgMicc6q+lLmBo8/HzLVTnb+Xe+qf Q5/pvyxFTYSWOLzB2Ib19OYjX6FvZ24NIyx6yTuKZOCFKaSLLe2mK0n68VYTzQt20OK5HVQeTPfV AEL/6HN0894ait59TYY1CIB5ozog2ZoXQlDMfDJkSP7+L996kT6ggxfaTUe/D5upn/4+KAFQ399I kchNcof0ejx3Pcgx4UqL1okaZ3WV5etw5iT/CTAAg79IPMduPdQAAAAASUVORK5CYII= --_004_2C85E14562B39345BCCAD90B8E7955C91E1C29DKEXC002adformcom_--