Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AC289932 for ; Wed, 22 May 2013 19:16:48 +0000 (UTC) Received: (qmail 53885 invoked by uid 500); 22 May 2013 19:16:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 53327 invoked by uid 500); 22 May 2013 19:16:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 53287 invoked by uid 99); 22 May 2013 19:16:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2013 19:16:44 +0000 X-ASF-Spam-Status: No, hits=2.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLYTO_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [98.139.213.166] (HELO nm29-vm0.bullet.mail.bf1.yahoo.com) (98.139.213.166) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2013 19:16:38 +0000 Received: from [98.139.212.153] by nm29.bullet.mail.bf1.yahoo.com with NNFMP; 22 May 2013 19:16:17 -0000 Received: from [98.139.212.240] by tm10.bullet.mail.bf1.yahoo.com with NNFMP; 22 May 2013 19:16:17 -0000 Received: from [127.0.0.1] by omp1049.mail.bf1.yahoo.com with NNFMP; 22 May 2013 19:16:17 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 223512.93452.bm@omp1049.mail.bf1.yahoo.com Received: (qmail 68210 invoked by uid 60001); 22 May 2013 19:16:17 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1369250176; bh=Y2b1PJd3yhr8hhjaTVpQPA6nwdPxjz9g8V00iJRuncg=; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=ZeLHFW3BRy0r5umXjjEx7gY449BxKH9ihMqlpHpPqAmTU7GtVksc1EpRwSRPGKkEGoIZohKFTzS0kIVWO0WywkP62jtkZGFmMmdRY0vGWobCqTXxXlMFJEU+V/x7G4SdrQG7y5uzs3Yapyhv/P0X3hxZjuqcrTlWo2Ov+GBsEeU= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type; b=JwvXXLwqQfcc2Nw5hP0JEcUQw0OAN4bU9vjLD5NrVNcEicZU53Ezxnm3gx/pttl9LqmDhGu814Uk8tv6nGavkflguLYIfto97WwGqXh8sIbHlFbKNd3RUjY3TNndRQMdPovtJ8KOygF7WJAl3zuSztozjipKxpPg/oNtjYM1ZOc=; X-YMail-OSG: Rq.4NA4VM1k67ZAPBB.IpQDFz1d7a0nlZOruOZusMaCsPig TbCqX0oUUKx50SD90vgVOxjRl0FrED_vbqoaiG8msdDpxk7dosZeDvRyXM3W LpNyGLlRe3QPqqfx_D2vHVEyOMD_K5Q..NBIiWeUQGmdytthDTnMxTTL_q8o dHDFo22II8D_ucUXDiPLBQxrgFb.G2lJJez9xxp7yo8WT17sNXwu698egle7 Q5ze2qUMKPidvMAARBHdJAONisTN2q3c5MURQKoLTd6p0u8j586aQgSG4Vl4 5k0ejJn6cWNUROzfpxFQCwgaHF4BgruicszM1wtmmqlWR5VVr_99sg6u8Ftj kIGF4DZzoqW9F_LjgxVW3Lxugk16ZQWevr9AD.aL.8kJQEj_vL9BjchtOuM9 .SvLa.LZetMlVxUCMheI3nUEnAcXD15kXoH4XHOmvQgKK1puD5WUHJLAoSf_ t7C6wsZezW59aFiur7QNyRZ1dzwjY7gyhVFQ6Y4eyCUS5aeu4QBIQwH5BA5p 7IsahUl0AkFUwCvIZFcx8XZzx6wgrH37SFi8Va_LZCWo- Received: from [208.185.20.30] by web160905.mail.bf1.yahoo.com via HTTP; Wed, 22 May 2013 12:16:16 PDT X-Mailer: YahooMailWebService/0.8.142.542 Message-ID: <1369250176.68013.GenericBBA@web160905.mail.bf1.yahoo.com> Date: Wed, 22 May 2013 12:16:16 -0700 (PDT) From: Wei Zhu Reply-To: Wei Zhu Subject: Re: High performance disk io To: user@cassandra.apache.org MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-1879531-59785519-1369250176=:68013" X-Virus-Checked: Checked by ClamAV on apache.org ---1879531-59785519-1369250176=:68013 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable For us, the biggest killer is repair and compaction following repair. If yo= u are running VNodes, you need to test the performance while running repair= . =0A=0A----- Original Message -----=0A=0AFrom: "Igor" =0ATo: user@cassandra.apache.org =0ASent: Wednesday, May 22, 2013 7:48:34= AM =0ASubject: Re: High performance disk io =0A=0A=0AOn 05/22/2013 05:41 P= M, Christopher Wirt wrote: =0A=0A=0A=0A=0AHi Igor, =0A=0AYea same here, 15m= s for 99 th percentile is our max. Currently getting one or two ms for most= CF. It goes up at peak times which is what we want to avoid. =0A=0A=0AOur = 99 percentile also goes up at peak times but stay at acceptable level. =0A= =0A=0A
=0A=0A=0AWe=E2=80=99re using Cass 1.2.4 w/vnodes and our = own barebones driver on top of thrift. Needed to be .NET so Hector and Asty= anax were not options. =0A=0A
=0AAstyanax is token-aware, so we= avoid extra data hops between cassandra nodes. =0A=0A=0A
=0A=0A= =0ADo you use SSDs or multiple SSDs in any kind of configuration or RAID? = =0A
=0A=0ANo, single SSD per host =0A=0A=0A
=0A=0A= =0A=0AThanks =0A=0AChris =0A=0A=0A=0AFrom: Igor [ mailto:igor@4friends.od.u= a ] =0ASent: 22 May 2013 15:07 =0ATo: user@cassandra.apache.org =0ASubject:= Re: High performance disk io =0A=0A=0AHello =0A=0AWhat level of read perfo= rmance do you expect? We have limit 15 ms for 99 percentile with average re= ad latency near 0.9ms. For some CF 99 percentile actually equals to 2ms, fo= r other - to 10ms, this depends on the data volume you read in each query. = =0A=0ATuning read performance involved cleaning up data model, tuning cassa= ndra.yaml, switching from Hector to astyanax, tuning OS parameters. =0A=0AO= n 05/22/2013 04:40 PM, Christopher Wirt wrote: =0A
=0A=0A=0AHell= o, =0A=0AWe=E2=80=99re looking at deploying a new ring where we want the be= st possible read performance. =0A=0AWe=E2=80=99ve setup a cluster with 6 no= des, replication level 3, 32Gb of memory, 8Gb Heap, 800Mb keycache, each ho= lding 40/50Gb of data on a 200Gb SSD and 500Gb SATA for OS and commitlog = =0AThree column families =0AColFamily1 50% of the load and data =0AColFamil= y2 35% of the load and data =0AColFamily3 15% of the load and data =0A=0AAt= the moment we are still seeing around 20% disk utilisation and occasionall= y as high as 40/50% on some nodes at peak time.. we are conducting some sem= i live testing. =0ACPU looks fine, memory is fine, keycache hit rate is abo= ut 80% (could be better, so maybe we should be increasing the keycache size= ?) =0A=0AAnyway, we=E2=80=99re looking into what we can do to improve this.= =0A=0AOne conversion we are having at the moment is around the SSD disk se= tup.. =0A=0AWe are considering moving to have 3 smaller SSD drives and spre= ading the data across those. =0A=0AThe possibilities are: =0A-We have a RAI= D0 of the smaller SSDs and hope that improves performance. =0AWill this acu= tally yield better throughput? =0A=0A-We mount the SSDs to different direct= ories and define multiple data directories in Cassandra.yaml. =0AWill not h= aving a layer of RAID controller improve the throughput? =0A=0A-We mount th= e SSDs to different columns family directories and have a single data direc= tory declared in Cassandra.yaml. =0AThink this is quite attractive idea. = =0AWhat are the drawbacks? System column families will be on the main SATA?= =0A=0A-We don=E2=80=99t change anything and just keep upping our keycache.= =0A-Anything you guys can think of. =0A=0AIdeas and thoughts welcome. Than= ks for your time and expertise. =0A=0AChris =0A=0A=0A
=0A=0A=0A=
=0A=0A=0A ---1879531-59785519-1369250176=:68013 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <= div style=3D'font-family: arial,helvetica,sans-serif; font-size: 10pt; colo= r: #000000'>For us, the biggest killer is repair and compaction following r= epair. If you are running VNodes, you need to test the performance while ru= nning repair.


From: "Igor" <igor@= 4friends.od.ua>
To: user@cassandra.apache.org
Sent: = Wednesday, May 22, 2013 7:48:34 AM
Subject: Re: High performance = disk io

=0A =0A =0A =0A
On 05= /22/2013 05:41 PM, Christopher=0A Wirt wrote:
=0A
=0A <= blockquote cite=3D"mid:00be01ce56fa$82a72d50$87f587f0$@struq.com">=0A = =0A =0A =0A
= =0A

H= i=0A Igor,

=0A

 

=0A

Yea=0A same here= , 15ms for 99th percentile is our max.=0A Currently g= etting one or two ms for most CF. It goes up at=0A peak times wh= ich is what we want to avoid.

=0A

<= span style=3D"font-size: 11pt; font-family: "Calibri","sans-= serif"; color: rgb(31, 73, 125);"> 

=0A
=0A = =0A Our 99 percentile also goes up at peak times but sta= y at acceptable=0A level.
=0A
=0A
=0A
=0A

We=E2=80=99re=0A using Cass 1.2.4 w/vnodes and our own barebo= nes driver on=0A top of thrift. Needed to be .NET so Hector and = Astyanax were=0A not options.

=0A

 

=0A =
=0A
=0A Astyanax is token-aware, so we avoid extr= a data hops between=0A cassandra nodes.
=0A
=0A =0A
=0A

Do=0A you use SSDs or multiple SSDs in any kind= of configuration=0A or RAID?

=0A
=0A =0A
=0A No, single SSD per host
=0A
=0A <= blockquote cite=3D"mid:00be01ce56fa$82a72d50$87f587f0$@struq.com">=0A =
=0A

 

=0A

Thanks

=0A  <= /p>=0A

 

=0A
=0A
=0A

From: Igor [mailto:ig= or@4friends.od.ua]
=0A Sent: 22 May 2013 15:0= 7
=0A To: user@cassandra.ap= ache.org
=0A Subject: Re: High performance dis= k io

=0A
=0A
=0A

 

=0A
=0A

Hel= lo
=0A
=0A What level of read performance do y= ou expect? We have limit=0A 15 ms for 99 percentile with average= read latency near=0A 0.9ms. For some CF 99 percentile actually = equals to 2ms, for=0A other - to 10ms, this depends on the data = volume you read in=0A each query.
=0A
=0A = Tuning read performance involved cleaning up data model,=0A = tuning cassandra.yaml, switching from Hector to astyanax,=0A = tuning OS parameters.
=0A
=0A On 05/22/2013 04= :40 PM, Christopher Wirt wrote:

=0A
=0A
=0A
=0A =

Hello,

=0A

 

=0A

We=E2=80=99re=0A looking at deploying a new ring where = we want the best=0A possible read performance.

=0A =

 

=0A

We=E2=80=99ve=0A setup a cluster with 6 n= odes, replication level 3, 32Gb of=0A memory, 8Gb Heap, 800Mb = keycache, each holding 40/50Gb of=0A data on a 200Gb SSD and 5= 00Gb SATA for OS and commitlog

=0A

Three=0A column families

=0A

ColFamily1=0A 50% of the load and data<= /p>=0A

ColFamily2=0A = 35% of the load and data

=0A

ColFamily3=0A 15% of the load and data

=0A =

 

=0A

At=0A the moment we are still seeing around= 20% disk utilisation=0A and occasionally as high as 40/50% on= some nodes at peak=0A time.. we are conducting some semi live= testing.

=0A

CPU=0A = looks fine, memory is fine, keycache hit rate is about 80%=0A = (could be better, so maybe we should be increasing the=0A = keycache size?)

=0A

&nbs= p;

=0A

Anyway,=0A = we=E2=80=99re looking into what we can do to improve this.

=0A =

 

=0A

One=0A conversion we are having at t= he moment is around the SSD=0A disk setup..

=0A =

 

=0A

We=0A are considering moving to have 3 small= er SSD drives and=0A spreading the data across those.

=0A =

 

=0A

The=0A possibilities are:

=0A =

-We=0A have a RAID= 0 of the smaller SSDs and hope that improves=0A performance. <= /p>=0A

Will=0A th= is acutally yield better throughput?

=0A

 

=0A

-W= e=0A mount the SSDs to different directories and define=0A = multiple data directories in Cassandra.yaml.

=0A Will=0A not having a layer of = RAID controller improve the=0A throughput?

=0A <= p class=3D"MsoNormal" style=3D""> 

=0A

-We=0A mount the SSDs to different columns fa= mily directories and=0A have a single data directory declared = in Cassandra.yaml.

=0A

Thi= nk=0A this is quite attractive idea.

=0A

What=0A are the drawbacks? System c= olumn families will be on the=0A main SATA?

=0A =

 

=0A

-We=0A don=E2=80=99t change anything and jus= t keep upping our keycache.

=0A

-Anything=0A you guys can think of.

=0A  

=0A

Ideas=0A and thoughts welcome. Thanks for your= time and expertise.=0A

=0A

 

=0A

C= hris

=0A

 

=0A =

 

=0A
=0A =
=0A

 

=0A =0A =0A
=0A

---1879531-59785519-1369250176=:68013--