cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric tamme <eta...@gmail.com>
Subject Re: simple question about merged SSTable sizes
Date Wed, 22 Jun 2011 16:50:23 GMT
On Wed, Jun 22, 2011 at 12:35 PM, Jonathan Colby
<jonathan.colby@gmail.com> wrote:
>
> The way compaction works,  "x" same-sized files are merged into a new SSTable.  This
repeats itself and the SSTable get bigger and bigger.
>
> So what is the upper limit??     If you are not deleting stuff fast enough, wouldn't
the SSTable sizes grow indefinitely?
>
> I ask because we have some rather large SSTable files (80-100 GB) and I'm starting to
worry about future compactions.
>
> Second, compacting such large files is an IO killer.    What can be tuned other than
compaction_threshold to help optimize this and prevent the files from getting too big?
>
> Thanks!


The compaction is an iterative process that first compacts uncompacted
SSTables and removes tombstones etc.  This compaction takes multiple
files and merges them into one SSTable.  This process repeats until
you have "compaction_threshold=X" number of similarly sized SSTables,
then those will get re-compacted (merged) together.  The number and
size of SSTables that you have as a result of a flush is tuned by max
size, or records, or time.  Contrary to what you might believe, having
fewer larger SSTables reduces IO compared to compacting many small
SSTables.  Also the merge operation of previously compacted SSTables
is relatively fast.

As far as I know, cassandra will continue compacting SSTables into an
indefinitely larger sized SSTable.  The tunable side of things is for
adjusting when to flush memtable to SSTable, and the number of
SSTables of similar size that must be present to execute a compaction.

-Eric

Mime
View raw message