cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Tuning a column family for archival
Date Thu, 11 Aug 2011 04:07:14 GMT
There's not much to do other than turn off the caches (which you have done) and leave it alone.


If you want to poke around perhaps look at the compaction settings (from CLI help):

- max_compaction_threshold: The maximum number of SSTables allowed before a
minor compaction is forced. Default is 32, setting to 0 disables minor
compactions.

Decreasing this will cause minor compactions to start more frequently and
be less intensive. The min_compaction_threshold and max_compaction_threshold
boundaries are the number of tables Cassandra attempts to merge together at
once.

- min_compaction_threshold: The minimum number of SSTables needed
to start a minor compaction. Default is 4, setting to 0 disables minor
compactions.

Increasing this will cause minor compactions to start less frequently and
be more intensive. The min_compaction_threshold and max_compaction_threshold
boundaries are the number of tables Cassandra attempts to merge together at
once. 

You *could* disable compaction and then manually compact at the best time. If you are not
doing many updates I'd wait and see. 

You could repair different CF's at different times. This would help with reducing the amount
of data that is used to build the Merkle tree's, but there is a bug about streaming the differences
that means extra data is streamed (can't remember the bug number now)

I'd wait to see if there is an issue first. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 11 Aug 2011, at 13:18, Jason Baker wrote:

> I have a column family that I'm using to archive records.  They're mostly kept around
for historical purposes.  Aside from that, they're mostly considered deleted.  It's probably
going to be very rare that anyone reads from this table *ever*.  I don't really even write
to it that much.  
> 
> Does anyone have advice for me as far as how (or if) I should tune this table with that
in mind?  My concern is less speeding up access to this table than it is making sure that
it doesn't impact the performance of any other column families in any way.
> 
> Here's the data from nodetool cfstat (although this table was just created a few days
ago):
> 
> 		Column Family: ArchivedLinks
> 		SSTable count: 1
> 		Space used (live): 29580801
> 		Space used (total): 97838786
> 		Number of Keys (estimate): 93184
> 		Memtable Columns Count: 7497
> 		Memtable Data Size: 3223587
> 		Memtable Switch Count: 11
> 		Read Count: 0
> 		Read Latency: NaN ms.
> 		Write Count: 139091
> 		Write Latency: 0.007 ms.
> 		Pending Tasks: 0
> 		Key cache: disabled
> 		Row cache: disabled
> 		Compacted row minimum size: 259
> 		Compacted row maximum size: 372
> 		Compacted row mean size: 311


Mime
View raw message