Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: error (athena.apache.org: local policy)
Message-ID: <519E20DE.8000803@4friends.od.ua>
Date: Thu, 23 May 2013 16:59:58 +0300
From: Igor <igor@4friends.od.ua>
User-Agent: Mozilla/5.0 (X11; Linux i686;
 rv:17.0) Gecko/20130329 Thunderbird/17.0.5
MIME-Version: 1.0
To: user@cassandra.apache.org
Subject: Re: High performance disk io
References: <00ad01ce56f1$f3199450$d94cbcf0$@struq.com>
 <519CD0FB.6030609@4friends.od.ua> <00be01ce56fa$82a72d50$87f587f0$@struq.com>
In-Reply-To: <00be01ce56fa$82a72d50$87f587f0$@struq.com>
Content-Type: multipart/alternative;
 boundary="------------030409090604080200020208"

This is a multi-part message in MIME format.
--------------030409090604080200020208
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hello Christopher,

BTW, are you talking about 99th percentiles on client side, or about 
percentiles from cassandra histograms for CF on cassandra side?

Thanks!

On 05/22/2013 05:41 PM, Christopher Wirt wrote:
>
> Hi Igor,
>
> Yea same here, 15ms for 99^th percentile is our max. Currently getting 
> one or two ms for most CF. It goes up at peak times which is what we 
> want to avoid.
>
> We're using Cass 1.2.4 w/vnodes and our own barebones driver on top of 
> thrift. Needed to be .NET so Hector and Astyanax were not options.
>
> Do you use SSDs or multiple SSDs in any kind of configuration or RAID?
>
> Thanks
>
> Chris
>
> *From:*Igor [mailto:igor@4friends.od.ua]
> *Sent:* 22 May 2013 15:07
> *To:* user@cassandra.apache.org
> *Subject:* Re: High performance disk io
>
> Hello
>
> What level of read performance do you expect? We have limit 15 ms for 
> 99 percentile with average read latency near 0.9ms. For some CF 99 
> percentile actually equals to 2ms, for other - to 10ms, this depends 
> on the data volume you read in each query.
>
> Tuning read performance involved cleaning up data model, tuning 
> cassandra.yaml, switching from Hector to astyanax, tuning OS parameters.
>
> On 05/22/2013 04:40 PM, Christopher Wirt wrote:
>
>     Hello,
>
>     We're looking at deploying a new ring where we want the best
>     possible read performance.
>
>     We've setup a cluster with 6 nodes, replication level 3, 32Gb of
>     memory, 8Gb Heap, 800Mb keycache, each holding 40/50Gb of data on
>     a 200Gb SSD and 500Gb SATA for OS and commitlog
>
>     Three column families
>
>     ColFamily1 50% of the load and data
>
>     ColFamily2 35% of the load and data
>
>     ColFamily3 15% of the load and data
>
>     At the moment we are still seeing around 20% disk utilisation and
>     occasionally as high as 40/50% on some nodes at peak time.. we are
>     conducting some semi live testing.
>
>     CPU looks fine, memory is fine, keycache hit rate is about 80%
>     (could be better, so maybe we should be increasing the keycache size?)
>
>     Anyway, we're looking into what we can do to improve this.
>
>     One conversion we are having at the moment is around the SSD disk
>     setup..
>
>     We are considering moving to have 3 smaller SSD drives and
>     spreading the data across those.
>
>     The possibilities are:
>
>     -We have a RAID0 of the smaller SSDs and hope that improves
>     performance.
>
>     Will this acutally yield better throughput?
>
>     -We mount the SSDs to different directories and define multiple
>     data directories in Cassandra.yaml.
>
>     Will not having a layer of RAID controller improve the throughput?
>
>     -We mount the SSDs to different columns family directories and
>     have a single data directory declared in Cassandra.yaml.
>
>     Think this is quite attractive idea.
>
>     What are the drawbacks? System column families will be on the main
>     SATA?
>
>     -We don't change anything and just keep upping our keycache.
>
>     -Anything you guys can think of.
>
>     Ideas and thoughts welcome. Thanks for your time and expertise.
>
>     Chris
>


--------------030409090604080200020208
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">Hello Christopher,<br>
      <br>
      BTW, are you talking about 99th percentiles on client side, or
      about percentiles from cassandra histograms for CF on cassandra
      side?<br>
      <br>
      Thanks!<br>
      <br>
      On 05/22/2013 05:41 PM, Christopher Wirt wrote:<br>
    </div>
    <blockquote cite="mid:00be01ce56fa$82a72d50$87f587f0$@struq.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=ISO-8859-1">
      <meta name="Generator" content="Microsoft Word 14 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0cm;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";
	color:black;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-size:10.0pt;}
@page WordSection1
	{size:612.0pt 792.0pt;
	margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
	{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Hi
            Igor, <o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Yea
            same here, 15ms for 99<sup>th</sup> percentile is our max.
            Currently getting one or two ms for most CF. It goes up at
            peak times which is what we want to avoid.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">We&#8217;re
            using Cass 1.2.4 w/vnodes and our own barebones driver on
            top of thrift. Needed to be .NET so Hector and Astyanax were
            not options.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Do
            you use SSDs or multiple SSDs in any kind of configuration
            or RAID?<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Thanks<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D">Chris<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
        <div>
          <div style="border:none;border-top:solid #B5C4DF
            1.0pt;padding:3.0pt 0cm 0cm 0cm">
            <p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;;color:windowtext"
                  lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;;color:windowtext"
                lang="EN-US"> Igor [<a class="moz-txt-link-freetext" href="mailto:igor@4friends.od.ua">mailto:igor@4friends.od.ua</a>] <br>
                <b>Sent:</b> 22 May 2013 15:07<br>
                <b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:user@cassandra.apache.org">user@cassandra.apache.org</a><br>
                <b>Subject:</b> Re: High performance disk io<o:p></o:p></span></p>
          </div>
        </div>
        <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
        <div>
          <p class="MsoNormal">Hello<br>
            <br>
            What level of read performance do you expect? We have limit
            15 ms for 99 percentile with average read latency near
            0.9ms. For some CF 99 percentile actually equals to 2ms, for
            other - to 10ms, this depends on the data volume you read in
            each query.<br>
            <br>
            Tuning read performance involved cleaning up data model,
            tuning cassandra.yaml, switching from Hector to astyanax,
            tuning OS parameters.<br>
            <br>
            On 05/22/2013 04:40 PM, Christopher Wirt wrote:<o:p></o:p></p>
        </div>
        <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
          <div>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hello,<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">We&#8217;re
              looking at deploying a new ring where we want the best
              possible read performance.<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">We&#8217;ve
              setup a cluster with 6 nodes, replication level 3, 32Gb of
              memory, 8Gb Heap, 800Mb keycache, each holding 40/50Gb of
              data on a 200Gb SSD and 500Gb SATA for OS and commitlog<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Three
              column families<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">ColFamily1
              50% of the load and data<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">ColFamily2
              35% of the load and data<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">ColFamily3
              15% of the load and data<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">At
              the moment we are still seeing around 20% disk utilisation
              and occasionally as high as 40/50% on some nodes at peak
              time.. we are conducting some semi live testing.<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">CPU
              looks fine, memory is fine, keycache hit rate is about 80%
              (could be better, so maybe we should be increasing the
              keycache size?)<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Anyway,
              we&#8217;re looking into what we can do to improve this.<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">One
              conversion we are having at the moment is around the SSD
              disk setup..<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">We
              are considering moving to have 3 smaller SSD drives and
              spreading the data across those.<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">The
              possibilities are:<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-We
              have a RAID0 of the smaller SSDs and hope that improves
              performance. <o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Will
              this acutally yield better throughput?<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-We
              mount the SSDs to different directories and define
              multiple data directories in Cassandra.yaml.<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Will
              not having a layer of RAID controller improve the
              throughput?<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-We
              mount the SSDs to different columns family directories and
              have a single data directory declared in Cassandra.yaml. <o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Think
              this is quite attractive idea.<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">What
              are the drawbacks? System column families will be on the
              main SATA?<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-We
              don&#8217;t change anything and just keep upping our keycache.<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">-Anything
              you guys can think of.<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Ideas
              and thoughts welcome. Thanks for your time and expertise.
              <o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Chris<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
            <p class="MsoNormal"
              style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">&nbsp;<o:p></o:p></p>
          </div>
        </blockquote>
        <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------030409090604080200020208--