cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Forsberg <forsb...@opera.com>
Subject Re: LeveledCompaction, streaming bulkload, and lot's of small sstables
Date Wed, 20 Aug 2014 07:23:21 GMT
On 2014-08-18 19:52, Robert Coli wrote:
> On Mon, Aug 18, 2014 at 6:21 AM, Erik Forsberg <forsberg@opera.com
> <mailto:forsberg@opera.com>> wrote:
> 
>     Is there some configuration knob I can tune to make this happen faster?
>     I'm getting a bit confused by the description for min_sstable_size,
>     bucket_high, bucket_low etc - and I'm not sure if they apply in this
>     case.
> 
> 
> You probably don't want to use multi-threaded compaction, it is removed
> upstream.
> 
> nodetool setcompactionthroughput 0
> 
> Assuming you have enough IO headroom etc.

OK. I disabled multithreaded and gave it a bit more throughput to play
with, but I still don't think that's the full story.

What I see is the following case:

1) My hadoop cluster is bulkloading around 1000 sstables to the
Cassandra cluster.

2) Cassandra will start compacting.

With SizeTiered, I would see multiple ongoing compactions on the CF in
question, each taking on 32 sstables and compacting to one, all of them
running at the same time.

With Leveled, I see only one compaction, taking on 32 sstables
compacting to one. When that finished, it will start another one. So
it's essentially a serial process, and it takes a much longer time than
what it does with SizeTiered. While this compaction is ongoing, read
performance is not very good.

http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
mentions LCS is parallelized in Cassandra 1.2, but maybe that patch
doesn't cover my use case (although I realize that my use case is maybe
a bit weird)

So my question is if this is something I can tune? I'm running 1.2.18
now, but am strongly considering upgrade to 2.0.X.

Regards,
\EF



Mime
View raw message