cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten von Eicken <>
Subject Re: how to increase compaction rate?
Date Wed, 14 Mar 2012 03:32:15 GMT
On 3/13/2012 4:13 PM, Viktor Jevdokimov wrote:
> What we did to speedup this process to return all exhausted nodes into
> normal state faster:
> We have created a 6 temporary virtual single Cassandra nodes with 2
> CPU cores and 8GB RAM.
> Stopped completely a compaction for CF on a production node.
> Leveled sstables from this production node was divided into 6 ranges
> and copied into 6 temporary empty nodes.
> On each node we ran a major compaction to compact just 1/6 of data,
> about 10-14GB. It took 1-2 hours to compact them into 1GB of data.
> Then all 6 sstables was copied into one of 6 nodes for a major
> compaction, finally getting expected 3GB sstable.
> Stopping production node, deleting files that was copied, returning
> compacted (may need renaming) and node is back to normal.
> Using separate nodes we saved original production nodes time not to
> compact exhausted CF forever, blocking compactions for other CFs. With
> 6 separate nodes we have compacted 2 productions nodes a day, so maybe
> it took the same time, but production nodes were free for regular
> compactions for other CFs.
Yikes, that's quite the ordeal, but I totally get why you had to go
there. Cassandra seems to work well within some use-case bounds and
lacks the sophistication to handle others well. I've been wondering
about the way I use it, which is to hold the last N days of logs and
corresponding index. This means that every day I make a zillion inserts
and a corresponding zillion of deletes for the data inserted N days ago.
The way the compaction works this is horrible. The data is essentially
immutable until it's deleted, yet it's copied a whole bunch of times. In
addition, it takes forever for the deletion tombstones to "meet" the
original data in a compaction and actually compact it away. I've also
run into the zillions of files problem with level compaction you did. I
ended up with over 30k SSTables for ~1TB of data. At that point the
compaction just ceases to make progress. And starting cassandra takes
>30 minutes just for it to open all the SSTables and when done 12GB of
memory are used. Better algorithms and some tools will be needed for all
this to "just work". But then, we're also just at V1.0.8...

View raw message