Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <511FC141.3050902@yahoo.com>
Date: Sat, 16 Feb 2013 12:26:25 -0500
From: Mike <mtheroux2@yahoo.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:17.0) Gecko/20130107 Thunderbird/17.0.2
MIME-Version: 1.0
To: user@cassandra.apache.org
Subject: Re: Size Tiered -> Leveled Compaction
References: <511BB62D.2060807@yahoo.com>
 <CAFyMrKGfiXsyGqLVFzC8bQr8zxd3ip2D3ZvLqKRBs3aEcOZXDA@mail.gmail.com>
 <1360875114.43272.YahooMailNeo@web160903.mail.bf1.yahoo.com>
 <87D968E5-56A0-4DA7-8676-BA90FF376761@yahoo.com>
In-Reply-To: <87D968E5-56A0-4DA7-8676-BA90FF376761@yahoo.com>
Content-Type: multipart/alternative;
 boundary="------------020401070200020104030200"

This is a multi-part message in MIME format.
--------------020401070200020104030200
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Another piece of information that would be useful is advice on how to 
properly set the SSTable size for your usecase.  I understand the 
default is 5MB, a lot of examples show the use of 10MB, and I've seen 
cases where people have set is as high as 200MB.

Any information is appreciated,
-Mike

On 2/14/2013 4:10 PM, Michael Theroux wrote:
> BTW, when I say "major compaction", I mean running the "nodetool 
> compact" command (which does a major compaction for Sized Tiered 
> Compaction).  I didn't see the distribution of SSTables I expected 
> until I ran that command, in the steps I described below.
>
> -Mike
>
> On Feb 14, 2013, at 3:51 PM, Wei Zhu wrote:
>
>> I haven't tried to switch compaction strategy. We started with LCS.
>>
>> For us, after massive data imports (5000 w/seconds for 6 days), the 
>> first repair is painful since there is quite some data inconsistency. 
>> For 150G nodes, repair brought in about 30 G and created thousands of 
>> pending compactions. It took almost a day to clear those. Just be 
>> prepared LCS is really slow in 1.1.X. System performance degrades 
>> during that time since reads could go to more SSTable, we see 20 
>> SSTable lookup for one read.. (We tried everything we can and 
>> couldn't speed it up. I think it's single threaded.... and it's not 
>> recommended to turn on multithread compaction. We even tried that, it 
>> didn't help )There is parallel LCS in 1.2 which is supposed to 
>> alleviate the pain. Haven't upgraded yet, hope it works:)
>>
>> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
>>
>>
>> Since our cluster is not write intensive, only 100 w/seconds. I don't 
>> see any pending compactions during regular operation.
>>
>> One thing worth mentioning is the size of the SSTable, default is 5M 
>> which is kind of small for 200G (all in one CF) data set, and we are 
>> on SSD.  It more than  150K files in one directory. (200G/5M = 40K 
>> SSTable and each SSTable creates 4 files on disk)  You might want to 
>> watch that and decide the SSTable size.
>>
>> By the way, there is no concept of Major compaction for LCS. Just for 
>> fun, you can look at a file called $CFName.json in your data 
>> directory and it tells you the SSTable distribution among different 
>> levels.
>>
>> -Wei
>>
>> ------------------------------------------------------------------------
>> *From:* Charles Brophy <cbrophy@zulily.com <mailto:cbrophy@zulily.com>>
>> *To:* user@cassandra.apache.org <mailto:user@cassandra.apache.org>
>> *Sent:* Thursday, February 14, 2013 8:29 AM
>> *Subject:* Re: Size Tiered -> Leveled Compaction
>>
>> I second these questions: we've been looking into changing some of 
>> our CFs to use leveled compaction as well. If anybody here has the 
>> wisdom to answer them it would be of wonderful help.
>>
>> Thanks
>> Charles
>>
>> On Wed, Feb 13, 2013 at 7:50 AM, Mike <mtheroux2@yahoo.com 
>> <mailto:mtheroux2@yahoo.com>> wrote:
>>
>>     Hello,
>>
>>     I'm investigating the transition of some of our column families
>>     from Size Tiered -> Leveled Compaction.  I believe we have some
>>     high-read-load column families that would benefit tremendously.
>>
>>     I've stood up a test DB Node to investigate the transition.  I
>>     successfully alter the column family, and I immediately noticed a
>>     large number (1000+) pending compaction tasks become available,
>>     but no compaction get executed.
>>
>>     I tried running "nodetool sstableupgrade" on the column family,
>>     and the compaction tasks don't move.
>>
>>     I also notice no changes to the size and distribution of the
>>     existing SSTables.
>>
>>     I then run a major compaction on the column family.  All pending
>>     compaction tasks get run, and the SSTables have a distribution
>>     that I would expect from LeveledCompaction (lots and lots of 10MB
>>     files).
>>
>>     Couple of questions:
>>
>>     1) Is a major compaction required to transition from size-tiered
>>     to leveled compaction?
>>     2) Are major compactions as much of a concern for
>>     LeveledCompaction as their are for Size Tiered?
>>
>>     All the documentation I found concerning transitioning from Size
>>     Tiered to Level compaction discuss the alter table cql command,
>>     but I haven't found too much on what else needs to be done after
>>     the schema change.
>>
>>     I did these tests with Cassandra 1.1.9.
>>
>>     Thanks,
>>     -Mike
>>
>>
>>
>>
>


--------------020401070200020104030200
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Another piece of information that would
      be useful is advice on how to properly set the SSTable size for
      your usecase.&nbsp; I understand the default is 5MB, a lot of examples
      show the use of 10MB, and I've seen cases where people have set is
      as high as 200MB.<br>
      <br>
      Any information is appreciated,<br>
      -Mike<br>
      <br>
      On 2/14/2013 4:10 PM, Michael Theroux wrote:<br>
    </div>
    <blockquote
      cite="mid:87D968E5-56A0-4DA7-8676-BA90FF376761@yahoo.com"
      type="cite">BTW, when I say "major compaction", I mean running the
      "nodetool compact" command (which does a major compaction for
      Sized Tiered Compaction). &nbsp;I didn't see the distribution of
      SSTables I expected until I ran that command, in the steps I
      described below. &nbsp;
      <div><br>
      </div>
      <div>-Mike</div>
      <div><br>
        <div>
          <div>On Feb 14, 2013, at 3:51 PM, Wei Zhu wrote:</div>
          <br class="Apple-interchange-newline">
          <blockquote type="cite">
            <div>
              <div style="color:#000; background-color:#fff;
                font-family:arial, helvetica, sans-serif;font-size:10pt">
                <div><span>I haven't tried to switch compaction
                    strategy. We started with LCS.&nbsp;</span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span><br>
                  </span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span>For
                    us, after massive data imports (5000 w/seconds for 6
                    days), the first repair is painful since there is
                    quite some data inconsistency. For 150G nodes,
                    repair brought in about 30 G and created thousands
                    of pending compactions. It took almost a day to
                    clear those. Just be prepared LCS is really slow in
                    1.1.X. System performance degrades during that time
                    since reads could go to more SSTable, we see 20
                    SSTable lookup for one read.. (We tried everything
                    we can and couldn't speed it up. I think it's single
                    threaded.... and it's not recommended to turn on
                    multithread compaction. We even tried that, it
                    didn't help )There is parallel LCS in 1.2 which is
                    supposed to alleviate the pain. Haven't upgraded
                    yet, hope it works:)</span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span><br>
                  </span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span><a
                      moz-do-not-send="true"
href="http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2">http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2</a><br>
                  </span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><br>
                </div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span><br>
                  </span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span>Since
                    our cluster is not write intensive, only 100
                    w/seconds. I don't see any pending compactions
                    during regular operation.&nbsp;</span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span><br>
                  </span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span>One
                    thing worth mentioning is the size of the SSTable,
                    default is 5M which is kind of small for 200G (all
                    in one CF) data set, and we are on SSD. &nbsp;It more
                    than &nbsp;150K files in one directory. (200G/5M = 40K
                    SSTable and each SSTable creates 4 files on disk)
                    &nbsp;You might want to watch that and decide the SSTable
                    size.&nbsp;</span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span><br>
                  </span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><span>By
                    the way, there is no concept of Major compaction for
                    LCS. Just for fun, you can look at a file called
                    $CFName.json in your data directory and it tells you
                    the SSTable distribution among different levels.&nbsp;</span></div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; "><br>
                </div>
                <div style="color: rgb(0, 0, 0); font-size: 13px;
                  font-family: arial, helvetica, sans-serif;
                  background-color: transparent; font-style: normal; ">-Wei</div>
                <div><br>
                </div>
                <div style="font-family: arial, helvetica, sans-serif;
                  font-size: 10pt; ">
                  <div style="font-family: 'times new roman', 'new
                    york', times, serif; font-size: 12pt; ">
                    <div dir="ltr"> <font face="Arial" size="2">
                        <hr size="1"> <b><span
                            style="font-weight:bold;">From:</span></b>
                        Charles Brophy &lt;<a moz-do-not-send="true"
                          href="mailto:cbrophy@zulily.com">cbrophy@zulily.com</a>&gt;<br>
                        <b><span style="font-weight: bold;">To:</span></b>
                        <a moz-do-not-send="true"
                          href="mailto:user@cassandra.apache.org">user@cassandra.apache.org</a>
                        <br>
                        <b><span style="font-weight: bold;">Sent:</span></b>
                        Thursday, February 14, 2013 8:29 AM<br>
                        <b><span style="font-weight: bold;">Subject:</span></b>
                        Re: Size Tiered -&gt; Leveled Compaction<br>
                      </font> </div>
                    <br>
                    <div id="yiv1248776886">I second these questions:
                      we've been looking into changing some of our CFs
                      to use&nbsp;leveled&nbsp;compaction as well. If anybody here
                      has the wisdom to answer them it would be of
                      wonderful help.
                      <div><br>
                      </div>
                      <div>Thanks</div>
                      <div>Charles<br>
                        <br>
                        <div class="yiv1248776886gmail_quote">On Wed,
                          Feb 13, 2013 at 7:50 AM, Mike <span dir="ltr">&lt;<a
                              moz-do-not-send="true" rel="nofollow"
                              ymailto="mailto:mtheroux2@yahoo.com"
                              target="_blank"
                              href="mailto:mtheroux2@yahoo.com">mtheroux2@yahoo.com</a>&gt;</span>
                          wrote:<br>
                          <blockquote class="yiv1248776886gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex;">
                            Hello,<br>
                            <br>
                            I'm investigating the transition of some of
                            our column families from Size Tiered -&gt;
                            Leveled Compaction. &nbsp;I believe we have some
                            high-read-load column families that would
                            benefit tremendously.<br>
                            <br>
                            I've stood up a test DB Node to investigate
                            the transition. &nbsp;I successfully alter the
                            column family, and I immediately noticed a
                            large number (1000+) pending compaction
                            tasks become available, but no compaction
                            get executed.<br>
                            <br>
                            I tried running "nodetool sstableupgrade" on
                            the column family, and the compaction tasks
                            don't move.<br>
                            <br>
                            I also notice no changes to the size and
                            distribution of the existing SSTables.<br>
                            <br>
                            I then run a major compaction on the column
                            family. &nbsp;All pending compaction tasks get
                            run, and the SSTables have a distribution
                            that I would expect from LeveledCompaction
                            (lots and lots of 10MB files).<br>
                            <br>
                            Couple of questions:<br>
                            <br>
                            1) Is a major compaction required to
                            transition from size-tiered to leveled
                            compaction?<br>
                            2) Are major compactions as much of a
                            concern for LeveledCompaction as their are
                            for Size Tiered?<br>
                            <br>
                            All the documentation I found concerning
                            transitioning from Size Tiered to Level
                            compaction discuss the alter table cql
                            command, but I haven't found too much on
                            what else needs to be done after the schema
                            change.<br>
                            <br>
                            I did these tests with Cassandra 1.1.9.<br>
                            <br>
                            Thanks,<br>
                            -Mike<br>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                    </div>
                    <br>
                    <br>
                  </div>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>

--------------020401070200020104030200--