cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Estimation of memtable size are wrong
Date Sun, 25 Mar 2012 22:36:02 GMT
> 1. its not possible to run them more often? There should be some limit - run live/serialized
calculation at least once per hour. They took just few seconds.
The live ratio is updated every time the operation count (since startup) for the CF doubles.


> 2. Why not use data from FlusherWriter to update estimations? Flusher knows number of
ops and serialized size after sstable is written to disk. These values should be used for
updating memtable live/serialized ratio.
The problem is tracking the live memory usage. Ops count and serialised bytes are tracked
by the memtable, not that serialised bytes is the throughput bytes no the amount that will
be written to disk.  

> INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java (line 704) Enqueuing
flush of Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops)
> ** Here should be noted that live/serialized size is ESTIMATED!! **
serialised is the serialised by throughput for the memtable, including overwrites. 

The ratio here is a strange 105363280 100.48 MB /  1317041 / 1.26 Mb  = 80. The live ratio
is capped at 64. 
Can you see any log messages about the live ratio for this CF ? 

> INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 283) Completed flushing
/var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)
Small file may be the result of a lot of overwrites and something odd happening with the live
ratio. Is compression on ? 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/03/2012, at 9:44 PM, Radim Kolar wrote:

> I wonder why are memtable estimations so bad.
> 
> 1. its not possible to run them more often? There should be some limit - run live/serialized
calculation at least once per hour. They took just few seconds.
> 2. Why not use data from FlusherWriter to update estimations? Flusher knows number of
ops and serialized size after sstable is written to disk. These values should be used for
updating memtable live/serialized ratio.
> 
> INFO [OptionalTasks:1] 2012-03-23 09:33:51,765 MeteredFlusher.java (line 62) flushing
high-traffic column family CFS(Keyspace='whois', ColumnFamily='ipbans') (estimated 105363280
bytes)
> INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 ColumnFamilyStore.java (line 704) Enqueuing
flush of Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 ops)
> ** Here should be noted that live/serialized size is ESTIMATED!! **
> INFO [FlushWriter:314] 2012-03-23 09:33:51,796 Memtable.java (line 246) Writing Memtable-ipbans@481336682(1317041/105363280
serialized/live bytes, 16755 ops)
> INFO [FlushWriter:314] 2012-03-23 09:33:51,799 Memtable.java (line 283) Completed flushing
/var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)
> 


Mime
View raw message