Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: "Hiller, Dean" <Dean.Hiller@nrel.gov>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>, Wei Zhu
	<wz1975@yahoo.com>
Date: Fri, 8 Mar 2013 11:50:06 -0700
Subject: Re: Size Tiered -> Leveled Compaction
Thread-Topic: Size Tiered -> Leveled Compaction
Thread-Index: Ac4cLbzBcGeoU+BGRmq3uWcCZgCRGw==
Message-ID: <CD5F80AD.22BAC%Dean.Hiller@nrel.gov>
In-Reply-To: <1362766309.97631.YahooMailNeo@web160904.mail.bf1.yahoo.com>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/14.3.1.130117
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

+1  (I would love to know this info).

Dean

From: Wei Zhu <wz1975@yahoo.com<mailto:wz1975@yahoo.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>, Wei Zhu <wz1975=
@yahoo.com<mailto:wz1975@yahoo.com>>
Date: Friday, March 8, 2013 11:11 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: Size Tiered -> Leveled Compaction

I have the same wonder.
We started with the default 5M and the compaction after repair takes too lo=
ng on 200G node, so we increase the size to 10M sort of arbitrarily since t=
here is not much documentation around it. Our tech op team still thinks the=
re are too many files in one directory. To fulfill the guidelines from them=
 (don't remember the exact number, but something in the range of 50K files)=
, we will need to increase the size to around 50M. I think the latency of  =
opening one file is not impacted much by the number of files in one directo=
ry for the modern file system. But "ls" and other operations suffer.

Anyway, I asked about the side effect of the bigger SSTable in IRC, someone=
 was mentioning during read, C* reads the whole SSTable from disk in order =
to access the row which causes more disk IO compared with the smaller SSTab=
le. I don't know enough about the internal of the Cassandra, not sure wheth=
er it's the case or not. If that is the case (with question mark) , the SST=
able or the row is kept in the memory? Hope someone can confirm the theory =
here. Or I have to dig in to the source code to find it.

Another concern is during repair, does it stream the whole SSTable or the p=
artial of it when mismatch is detected? I see the claim for both, can someo=
ne please confirm also?

The last thing is the effectiveness of the parallel LCS on 1.2. It takes qu=
ite some time for the compaction to finish after repair for LCS for 1.1.X. =
Both CPU and disk Util is low during the compaction which means LCS doesn't=
 fully utilized resource.  It will make the life easier if the issue is add=
ressed in 1.2.

Bottom line is that there is not much documentation/guideline/successful st=
ory around LCS although it sounds beautiful on paper.

Thanks.
-Wei
________________________________
From: Alain RODRIGUEZ <arodrime@gmail.com<mailto:arodrime@gmail.com>>
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Cc: Wei Zhu <wz1975@yahoo.com<mailto:wz1975@yahoo.com>>
Sent: Friday, March 8, 2013 1:25 AM
Subject: Re: Size Tiered -> Leveled Compaction

I'm still wondering about how to chose the size of the sstable under LCS. D=
efaul is 5MB, people use to configure it to 10MB and now you configure it a=
t 128MB. What are the benefits or inconveniants of a very small size (let's=
 say 5 MB) vs big size (like 128MB) ?

Alain


2013/3/8 Al Tobey <al@ooyala.com<mailto:al@ooyala.com>>
We saw the exactly the same thing as Wei Zhu, > 100k tables in a directory =
causing all kinds of issues.  We're running 128MiB ssTables with LCS and ha=
ve disabled compaction throttling.  128MiB was chosen to get file counts un=
der control and reduce the number of files C* has to manage & search. I jus=
t looked and a ~250GiB node is using about 10,000 files, which is quite man=
ageable.  This configuration is running smoothly in production under mixed =
read/write load.

We're on RAID0 across 6 15k drives per machine. When we migrated data to th=
is cluster we were pushing well over 26k/s+ inserts with CL_QUORUM. With co=
mpaction throttling enabled at any rate it just couldn't keep up. With thro=
ttling off, it runs smoothly and does not appear to have an impact on our a=
pplications, so we always leave it off, even in EC2.  An 8GiB heap is too s=
mall for this config on 1.1. YMMV.

-Al Tobey

On Thu, Feb 14, 2013 at 12:51 PM, Wei Zhu <wz1975@yahoo.com<mailto:wz1975@y=
ahoo.com>> wrote:
I haven't tried to switch compaction strategy. We started with LCS.

For us, after massive data imports (5000 w/seconds for 6 days), the first r=
epair is painful since there is quite some data inconsistency. For 150G nod=
es, repair brought in about 30 G and created thousands of pending compactio=
ns. It took almost a day to clear those. Just be prepared LCS is really slo=
w in 1.1.X. System performance degrades during that time since reads could =
go to more SSTable, we see 20 SSTable lookup for one read.. (We tried every=
thing we can and couldn't speed it up. I think it's single threaded.... and=
 it's not recommended to turn on multithread compaction. We even tried that=
, it didn't help )There is parallel LCS in 1.2 which is supposed to allevia=
te the pain. Haven't upgraded yet, hope it works:)

http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2


Since our cluster is not write intensive, only 100 w/seconds. I don't see a=
ny pending compactions during regular operation.

One thing worth mentioning is the size of the SSTable, default is 5M which =
is kind of small for 200G (all in one CF) data set, and we are on SSD.  It =
more than  150K files in one directory. (200G/5M =3D 40K SSTable and each S=
STable creates 4 files on disk)  You might want to watch that and decide th=
e SSTable size.

By the way, there is no concept of Major compaction for LCS. Just for fun, =
you can look at a file called $CFName.json in your data directory and it te=
lls you the SSTable distribution among different levels.

-Wei

________________________________
From: Charles Brophy <cbrophy@zulily.com<mailto:cbrophy@zulily.com>>
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Sent: Thursday, February 14, 2013 8:29 AM
Subject: Re: Size Tiered -> Leveled Compaction

I second these questions: we've been looking into changing some of our CFs =
to use leveled compaction as well. If anybody here has the wisdom to answer=
 them it would be of wonderful help.

Thanks
Charles

On Wed, Feb 13, 2013 at 7:50 AM, Mike <mtheroux2@yahoo.com<mailto:mtheroux2=
@yahoo.com>> wrote:
Hello,

I'm investigating the transition of some of our column families from Size T=
iered -> Leveled Compaction.  I believe we have some high-read-load column =
families that would benefit tremendously.

I've stood up a test DB Node to investigate the transition.  I successfully=
 alter the column family, and I immediately noticed a large number (1000+) =
pending compaction tasks become available, but no compaction get executed.

I tried running "nodetool sstableupgrade" on the column family, and the com=
paction tasks don't move.

I also notice no changes to the size and distribution of the existing SSTab=
les.

I then run a major compaction on the column family.  All pending compaction=
 tasks get run, and the SSTables have a distribution that I would expect fr=
om LeveledCompaction (lots and lots of 10MB files).

Couple of questions:

1) Is a major compaction required to transition from size-tiered to leveled=
 compaction?
2) Are major compactions as much of a concern for LeveledCompaction as thei=
r are for Size Tiered?

All the documentation I found concerning transitioning from Size Tiered to =
Level compaction discuss the alter table cql command, but I haven't found t=
oo much on what else needs to be done after the schema change.

I did these tests with Cassandra 1.1.9.

Thanks,
-Mike