cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radim Kolar <>
Subject Estimation of memtable size are wrong
Date Fri, 23 Mar 2012 08:44:58 GMT
I wonder why are memtable estimations so bad.

1. its not possible to run them more often? There should be some limit - 
run live/serialized calculation at least once per hour. They took just 
few seconds.
2. Why not use data from FlusherWriter to update estimations? Flusher 
knows number of ops and serialized size after sstable is written to 
disk. These values should be used for updating memtable live/serialized 

  INFO [OptionalTasks:1] 2012-03-23 09:33:51,765 
(line 62) flushing high-traffic column family CFS(Keyspace='whois', 
ColumnFamily='ipbans') (estimated 105363280 bytes)
  INFO [OptionalTasks:1] 2012-03-23 09:33:51,796 
(line 704) Enqueuing flush of 
Memtable-ipbans@481336682(1317041/105363280 serialized/live bytes, 16755 
  ** Here should be noted that live/serialized size is ESTIMATED!! **
  INFO [FlushWriter:314] 2012-03-23 09:33:51,796 (line 
246) Writing Memtable-ipbans@481336682(1317041/105363280 serialized/live 
bytes, 16755 ops)
  INFO [FlushWriter:314] 2012-03-23 09:33:51,799 (line 
283) Completed flushing 
/var/lib/cassandra/data/whois/ipbans-hc-16775-Data.db (1355 bytes)

View raw message