Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=X-YMail-OSG:Received:X-Mailer:Message-ID:Date:From:Reply-To:Subject:To:MIME-Version:Content-Type;
  b=JwvXXLwqQfcc2Nw5hP0JEcUQw0OAN4bU9vjLD5NrVNcEicZU53Ezxnm3gx/pttl9LqmDhGu814Uk8tv6nGavkflguLYIfto97WwGqXh8sIbHlFbKNd3RUjY3TNndRQMdPovtJ8KOygF7WJAl3zuSztozjipKxpPg/oNtjYM1ZOc=;
Message-ID: <1369250176.68013.GenericBBA@web160905.mail.bf1.yahoo.com>
Date: Wed, 22 May 2013 12:16:16 -0700 (PDT)
From: Wei Zhu <wz1975@yahoo.com>
Reply-To: Wei Zhu <wz1975@yahoo.com>
Subject: Re: High performance disk io
To: user@cassandra.apache.org
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="-1879531-59785519-1369250176=:68013"

---1879531-59785519-1369250176=:68013
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

For us, the biggest killer is repair and compaction following repair. If yo=
u are running VNodes, you need to test the performance while running repair=
. =0A=0A----- Original Message -----=0A=0AFrom: "Igor" <igor@4friends.od.ua=
> =0ATo: user@cassandra.apache.org =0ASent: Wednesday, May 22, 2013 7:48:34=
 AM =0ASubject: Re: High performance disk io =0A=0A=0AOn 05/22/2013 05:41 P=
M, Christopher Wirt wrote: =0A=0A=0A=0A=0AHi Igor, =0A=0AYea same here, 15m=
s for 99 th percentile is our max. Currently getting one or two ms for most=
 CF. It goes up at peak times which is what we want to avoid. =0A=0A=0AOur =
99 percentile also goes up at peak times but stay at acceptable level. =0A=
=0A=0A<blockquote>=0A=0A=0AWe=E2=80=99re using Cass 1.2.4 w/vnodes and our =
own barebones driver on top of thrift. Needed to be .NET so Hector and Asty=
anax were not options. =0A=0A</blockquote>=0AAstyanax is token-aware, so we=
 avoid extra data hops between cassandra nodes. =0A=0A=0A<blockquote>=0A=0A=
=0ADo you use SSDs or multiple SSDs in any kind of configuration or RAID? =
=0A</blockquote>=0A=0ANo, single SSD per host =0A=0A=0A<blockquote>=0A=0A=
=0A=0AThanks =0A=0AChris =0A=0A=0A=0AFrom: Igor [ mailto:igor@4friends.od.u=
a ] =0ASent: 22 May 2013 15:07 =0ATo: user@cassandra.apache.org =0ASubject:=
 Re: High performance disk io =0A=0A=0AHello =0A=0AWhat level of read perfo=
rmance do you expect? We have limit 15 ms for 99 percentile with average re=
ad latency near 0.9ms. For some CF 99 percentile actually equals to 2ms, fo=
r other - to 10ms, this depends on the data volume you read in each query. =
=0A=0ATuning read performance involved cleaning up data model, tuning cassa=
ndra.yaml, switching from Hector to astyanax, tuning OS parameters. =0A=0AO=
n 05/22/2013 04:40 PM, Christopher Wirt wrote: =0A<blockquote>=0A=0A=0AHell=
o, =0A=0AWe=E2=80=99re looking at deploying a new ring where we want the be=
st possible read performance. =0A=0AWe=E2=80=99ve setup a cluster with 6 no=
des, replication level 3, 32Gb of memory, 8Gb Heap, 800Mb keycache, each ho=
lding 40/50Gb of data on a 200Gb SSD and 500Gb SATA for OS and commitlog =
=0AThree column families =0AColFamily1 50% of the load and data =0AColFamil=
y2 35% of the load and data =0AColFamily3 15% of the load and data =0A=0AAt=
 the moment we are still seeing around 20% disk utilisation and occasionall=
y as high as 40/50% on some nodes at peak time.. we are conducting some sem=
i live testing. =0ACPU looks fine, memory is fine, keycache hit rate is abo=
ut 80% (could be better, so maybe we should be increasing the keycache size=
?) =0A=0AAnyway, we=E2=80=99re looking into what we can do to improve this.=
 =0A=0AOne conversion we are having at the moment is around the SSD disk se=
tup.. =0A=0AWe are considering moving to have 3 smaller SSD drives and spre=
ading the data across those. =0A=0AThe possibilities are: =0A-We have a RAI=
D0 of the smaller SSDs and hope that improves performance. =0AWill this acu=
tally yield better throughput? =0A=0A-We mount the SSDs to different direct=
ories and define multiple data directories in Cassandra.yaml. =0AWill not h=
aving a layer of RAID controller improve the throughput? =0A=0A-We mount th=
e SSDs to different columns family directories and have a single data direc=
tory declared in Cassandra.yaml. =0AThink this is quite attractive idea. =
=0AWhat are the drawbacks? System column families will be on the main SATA?=
 =0A=0A-We don=E2=80=99t change anything and just keep upping our keycache.=
 =0A-Anything you guys can think of. =0A=0AIdeas and thoughts welcome. Than=
ks for your time and expertise. =0A=0AChris =0A=0A=0A</blockquote>=0A=0A=0A=
</blockquote>=0A=0A=0A
---1879531-59785519-1369250176=:68013
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><head><style type=3D'text/css'>p { margin: 0; }</style></head><body><=
div style=3D'font-family: arial,helvetica,sans-serif; font-size: 10pt; colo=
r: #000000'>For us, the biggest killer is repair and compaction following r=
epair. If you are running VNodes, you need to test the performance while ru=
nning repair.<br><br><hr id=3D"zwchr"><div style=3D"color: rgb(0, 0, 0); fo=
nt-weight: normal; font-style: normal; text-decoration: none; font-family: =
Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Igor" &lt;igor@=
4friends.od.ua&gt;<br><b>To: </b>user@cassandra.apache.org<br><b>Sent: </b>=
Wednesday, May 22, 2013 7:48:34 AM<br><b>Subject: </b>Re: High performance =
disk io<br><br>=0A  =0A    =0A  =0A    <div class=3D"moz-cite-prefix">On 05=
/22/2013 05:41 PM, Christopher=0A      Wirt wrote:<br>=0A    </div>=0A    <=
blockquote cite=3D"mid:00be01ce56fa$82a72d50$87f587f0$@struq.com">=0A      =
=0A      =0A      <style><!--=0A=0A@font-face=0A=09{font-family:Calibri;=0A=
=09panose-1:2 15 5 2 2 2 4 3 2 4;}=0A@font-face=0A=09{font-family:Tahoma;=
=0A=09panose-1:2 11 6 4 3 5 4 4 2 4;}=0A=0Ap.MsoNormal, li.MsoNormal, div.M=
soNormal=0A=09{margin:0cm;=0A=09margin-bottom:.0001pt;=0A=09font-size:12.0p=
t;=0A=09font-family:"Times New Roman","serif";=0A=09color:black;}=0Aa:link,=
 span.MsoHyperlink=0A=09{mso-style-priority:99;=0A=09color:blue;=0A=09text-=
decoration:underline;}=0Aa:visited, span.MsoHyperlinkFollowed=0A=09{mso-sty=
le-priority:99;=0A=09color:purple;=0A=09text-decoration:underline;}=0Aspan.=
EmailStyle17=0A=09{mso-style-type:personal-reply;=0A=09font-family:"Calibri=
","sans-serif";=0A=09color:#1F497D;}=0A.MsoChpDefault=0A=09{mso-style-type:=
export-only;=0A=09font-size:10.0pt;}=0A@page WordSection1=0A=09{size:612.0p=
t 792.0pt;=0A=09margin:72.0pt 72.0pt 72.0pt 72.0pt;}=0Adiv.WordSection1=0A=
=09{page:WordSection1;}=0A--></style>=0A      <div class=3D"WordSection1">=
=0A        <p class=3D"MsoNormal"><span style=3D"font-size: 11pt; font-fami=
ly: &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rgb(31, 73, 125);">H=
i=0A            Igor, </span></p>=0A        <p class=3D"MsoNormal"><span st=
yle=3D"font-size: 11pt; font-family: &quot;Calibri&quot;,&quot;sans-serif&q=
uot;; color: rgb(31, 73, 125);">&nbsp;</span></p>=0A        <p class=3D"Mso=
Normal"><span style=3D"font-size: 11pt; font-family: &quot;Calibri&quot;,&q=
uot;sans-serif&quot;; color: rgb(31, 73, 125);">Yea=0A            same here=
, 15ms for 99<sup>th</sup> percentile is our max.=0A            Currently g=
etting one or two ms for most CF. It goes up at=0A            peak times wh=
ich is what we want to avoid.</span></p>=0A        <p class=3D"MsoNormal"><=
span style=3D"font-size: 11pt; font-family: &quot;Calibri&quot;,&quot;sans-=
serif&quot;; color: rgb(31, 73, 125);">&nbsp;</span></p>=0A      </div>=0A =
   </blockquote>=0A    Our 99 percentile also goes up at peak times but sta=
y at acceptable=0A    level.<br>=0A    <br>=0A    <blockquote cite=3D"mid:0=
0be01ce56fa$82a72d50$87f587f0$@struq.com">=0A      <div class=3D"WordSectio=
n1">=0A        <p class=3D"MsoNormal"><span style=3D"font-size: 11pt; font-=
family: &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rgb(31, 73, 125)=
;">We=E2=80=99re=0A            using Cass 1.2.4 w/vnodes and our own barebo=
nes driver on=0A            top of thrift. Needed to be .NET so Hector and =
Astyanax were=0A            not options.</span></p>=0A        <p class=3D"M=
soNormal"><span style=3D"font-size: 11pt; font-family: &quot;Calibri&quot;,=
&quot;sans-serif&quot;; color: rgb(31, 73, 125);">&nbsp;</span></p>=0A     =
 </div>=0A    </blockquote>=0A    Astyanax is token-aware, so we avoid extr=
a data hops between=0A    cassandra nodes.<br>=0A    <br>=0A    <blockquote=
 cite=3D"mid:00be01ce56fa$82a72d50$87f587f0$@struq.com">=0A      <div class=
=3D"WordSection1">=0A        <p class=3D"MsoNormal"><span style=3D"font-siz=
e: 11pt; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rg=
b(31, 73, 125);">Do=0A            you use SSDs or multiple SSDs in any kind=
 of configuration=0A            or RAID?</span></p>=0A      </div>=0A    </=
blockquote>=0A    <br>=0A    No, single SSD per host<br>=0A    <br>=0A    <=
blockquote cite=3D"mid:00be01ce56fa$82a72d50$87f587f0$@struq.com">=0A      =
<div class=3D"WordSection1">=0A        <p class=3D"MsoNormal"><span style=
=3D"font-size: 11pt; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot=
;; color: rgb(31, 73, 125);">&nbsp;</span></p>=0A        <p class=3D"MsoNor=
mal"><span style=3D"font-size: 11pt; font-family: &quot;Calibri&quot;,&quot=
;sans-serif&quot;; color: rgb(31, 73, 125);">Thanks</span></p>=0A        <p=
 class=3D"MsoNormal"><span style=3D"font-size: 11pt; font-family: &quot;Cal=
ibri&quot;,&quot;sans-serif&quot;; color: rgb(31, 73, 125);">&nbsp;</span><=
/p>=0A        <p class=3D"MsoNormal"><span style=3D"font-size: 11pt; font-f=
amily: &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rgb(31, 73, 125);=
">Chris</span></p>=0A        <p class=3D"MsoNormal"><span style=3D"font-siz=
e: 11pt; font-family: &quot;Calibri&quot;,&quot;sans-serif&quot;; color: rg=
b(31, 73, 125);">&nbsp;</span></p>=0A        <div>=0A          <div style=
=3D"border-right: medium none; border-width: 1pt medium medium; border-styl=
e: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -m=
oz-use-text-color; padding: 3pt 0cm 0cm;">=0A            <p class=3D"MsoNor=
mal"><b><span style=3D"font-size: 10pt; font-family: &quot;Tahoma&quot;,&qu=
ot;sans-serif&quot;; color: windowtext;" lang=3D"EN-US">From:</span></b><sp=
an style=3D"font-size: 10pt; font-family: &quot;Tahoma&quot;,&quot;sans-ser=
if&quot;; color: windowtext;" lang=3D"EN-US"> Igor [<a class=3D"moz-txt-lin=
k-freetext" href=3D"mailto:igor@4friends.od.ua" target=3D"_blank">mailto:ig=
or@4friends.od.ua</a>] <br>=0A                <b>Sent:</b> 22 May 2013 15:0=
7<br>=0A                <b>To:</b> <a class=3D"moz-txt-link-abbreviated" hr=
ef=3D"mailto:user@cassandra.apache.org" target=3D"_blank">user@cassandra.ap=
ache.org</a><br>=0A                <b>Subject:</b> Re: High performance dis=
k io</span></p>=0A          </div>=0A        </div>=0A        <p class=3D"M=
soNormal">&nbsp;</p>=0A        <div>=0A          <p class=3D"MsoNormal">Hel=
lo<br>=0A            <br>=0A            What level of read performance do y=
ou expect? We have limit=0A            15 ms for 99 percentile with average=
 read latency near=0A            0.9ms. For some CF 99 percentile actually =
equals to 2ms, for=0A            other - to 10ms, this depends on the data =
volume you read in=0A            each query.<br>=0A            <br>=0A     =
       Tuning read performance involved cleaning up data model,=0A         =
   tuning cassandra.yaml, switching from Hector to astyanax,=0A            =
tuning OS parameters.<br>=0A            <br>=0A            On 05/22/2013 04=
:40 PM, Christopher Wirt wrote:</p>=0A        </div>=0A        <blockquote =
style=3D"margin-top: 5pt; margin-bottom: 5pt;">=0A          <div>=0A       =
     <p class=3D"MsoNormal" style=3D"">Hello,</p>=0A            <p class=3D=
"MsoNormal" style=3D"">&nbsp;</p>=0A            <p class=3D"MsoNormal" styl=
e=3D"">We=E2=80=99re=0A              looking at deploying a new ring where =
we want the best=0A              possible read performance.</p>=0A         =
   <p class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A            <p class=3D"M=
soNormal" style=3D"">We=E2=80=99ve=0A              setup a cluster with 6 n=
odes, replication level 3, 32Gb of=0A              memory, 8Gb Heap, 800Mb =
keycache, each holding 40/50Gb of=0A              data on a 200Gb SSD and 5=
00Gb SATA for OS and commitlog</p>=0A            <p class=3D"MsoNormal" sty=
le=3D"">Three=0A              column families</p>=0A            <p class=3D=
"MsoNormal" style=3D"">ColFamily1=0A              50% of the load and data<=
/p>=0A            <p class=3D"MsoNormal" style=3D"">ColFamily2=0A          =
    35% of the load and data</p>=0A            <p class=3D"MsoNormal" style=
=3D"">ColFamily3=0A              15% of the load and data</p>=0A           =
 <p class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A            <p class=3D"Mso=
Normal" style=3D"">At=0A              the moment we are still seeing around=
 20% disk utilisation=0A              and occasionally as high as 40/50% on=
 some nodes at peak=0A              time.. we are conducting some semi live=
 testing.</p>=0A            <p class=3D"MsoNormal" style=3D"">CPU=0A       =
       looks fine, memory is fine, keycache hit rate is about 80%=0A       =
       (could be better, so maybe we should be increasing the=0A           =
   keycache size?)</p>=0A            <p class=3D"MsoNormal" style=3D"">&nbs=
p;</p>=0A            <p class=3D"MsoNormal" style=3D"">Anyway,=0A          =
    we=E2=80=99re looking into what we can do to improve this.</p>=0A      =
      <p class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A            <p class=
=3D"MsoNormal" style=3D"">One=0A              conversion we are having at t=
he moment is around the SSD=0A              disk setup..</p>=0A            =
<p class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A            <p class=3D"MsoN=
ormal" style=3D"">We=0A              are considering moving to have 3 small=
er SSD drives and=0A              spreading the data across those.</p>=0A  =
          <p class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A            <p cla=
ss=3D"MsoNormal" style=3D"">The=0A              possibilities are:</p>=0A  =
          <p class=3D"MsoNormal" style=3D"">-We=0A              have a RAID=
0 of the smaller SSDs and hope that improves=0A              performance. <=
/p>=0A            <p class=3D"MsoNormal" style=3D"">Will=0A              th=
is acutally yield better throughput?</p>=0A            <p class=3D"MsoNorma=
l" style=3D"">&nbsp;</p>=0A            <p class=3D"MsoNormal" style=3D"">-W=
e=0A              mount the SSDs to different directories and define=0A    =
          multiple data directories in Cassandra.yaml.</p>=0A            <p=
 class=3D"MsoNormal" style=3D"">Will=0A              not having a layer of =
RAID controller improve the=0A              throughput?</p>=0A            <=
p class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A            <p class=3D"MsoNo=
rmal" style=3D"">-We=0A              mount the SSDs to different columns fa=
mily directories and=0A              have a single data directory declared =
in Cassandra.yaml. </p>=0A            <p class=3D"MsoNormal" style=3D"">Thi=
nk=0A              this is quite attractive idea.</p>=0A            <p clas=
s=3D"MsoNormal" style=3D"">What=0A              are the drawbacks? System c=
olumn families will be on the=0A              main SATA?</p>=0A            =
<p class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A            <p class=3D"MsoN=
ormal" style=3D"">-We=0A              don=E2=80=99t change anything and jus=
t keep upping our keycache.</p>=0A            <p class=3D"MsoNormal" style=
=3D"">-Anything=0A              you guys can think of.</p>=0A            <p=
 class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A            <p class=3D"MsoNor=
mal" style=3D"">Ideas=0A              and thoughts welcome. Thanks for your=
 time and expertise.=0A              </p>=0A            <p class=3D"MsoNorm=
al" style=3D"">&nbsp;</p>=0A            <p class=3D"MsoNormal" style=3D"">C=
hris</p>=0A            <p class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A     =
       <p class=3D"MsoNormal" style=3D"">&nbsp;</p>=0A          </div>=0A  =
      </blockquote>=0A        <p class=3D"MsoNormal">&nbsp;</p>=0A      </d=
iv>=0A    </blockquote>=0A    <br>=0A  </div><br></div></body></html>
---1879531-59785519-1369250176=:68013--