incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Tuning a column family for archival
Date Thu, 11 Aug 2011 13:14:01 GMT
On Thu, Aug 11, 2011 at 12:07 AM, aaron morton <aaron@thelastpickle.com>wrote:

> There's not much to do other than turn off the caches (which you have done)
> and leave it alone.
>
> If you want to poke around perhaps look at the compaction settings (from
> CLI help):
>
> - max_compaction_threshold: The maximum number of SSTables allowed before a
> minor compaction is forced. Default is 32, setting to 0 disables minor
> compactions.
>
> Decreasing this will cause minor compactions to start more frequently and
> be less intensive. The min_compaction_threshold and
> max_compaction_threshold
> boundaries are the number of tables Cassandra attempts to merge together at
> once.
>
> - min_compaction_threshold: The minimum number of SSTables needed
> to start a minor compaction. Default is 4, setting to 0 disables minor
> compactions.
>
> Increasing this will cause minor compactions to start less frequently and
> be more intensive. The min_compaction_threshold and
> max_compaction_threshold
> boundaries are the number of tables Cassandra attempts to merge together at
> once.
>
> You *could* disable compaction and then manually compact at the best time.
> If you are not doing many updates I'd wait and see.
>
> You could repair different CF's at different times. This would help with
> reducing the amount of data that is used to build the Merkle tree's, but
> there is a bug about streaming the differences that means extra data is
> streamed (can't remember the bug number now)
>
> I'd wait to see if there is an issue first.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 11 Aug 2011, at 13:18, Jason Baker wrote:
>
> > I have a column family that I'm using to archive records.  They're mostly
> kept around for historical purposes.  Aside from that, they're mostly
> considered deleted.  It's probably going to be very rare that anyone reads
> from this table *ever*.  I don't really even write to it that much.
> >
> > Does anyone have advice for me as far as how (or if) I should tune this
> table with that in mind?  My concern is less speeding up access to this
> table than it is making sure that it doesn't impact the performance of any
> other column families in any way.
> >
> > Here's the data from nodetool cfstat (although this table was just
> created a few days ago):
> >
> >               Column Family: ArchivedLinks
> >               SSTable count: 1
> >               Space used (live): 29580801
> >               Space used (total): 97838786
> >               Number of Keys (estimate): 93184
> >               Memtable Columns Count: 7497
> >               Memtable Data Size: 3223587
> >               Memtable Switch Count: 11
> >               Read Count: 0
> >               Read Latency: NaN ms.
> >               Write Count: 139091
> >               Write Latency: 0.007 ms.
> >               Pending Tasks: 0
> >               Key cache: disabled
> >               Row cache: disabled
> >               Compacted row minimum size: 259
> >               Compacted row maximum size: 372
> >               Compacted row mean size: 311
>
>
In many regards Cassandra automatically does the correct thing. Other then
the costs of the bloom filters for the table size being in ram, if you never
read or write to those sstables and you are not reusing the row key, the OS
will page out those tables and they will not take any cache space.

Coming soon compaction is going to change a lot, I know one of the tickets
in the works is that SSTables will have a max size, compaction should do
something like a one to one rewrite of these tables, which should not be
very intensive.

Mime
View raw message