incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julie <julie.su...@nextcentury.com>
Subject How does cfstats calculate Row Size?
Date Thu, 12 Aug 2010 16:08:46 GMT
I am chasing down a row size discrepancy and am confused.

I populated a single node Cassandra cluster with 10,000 rows of data, using 
numeric keys 1-10,000, where each row is a little over 100kB in length and has 
a single column in it. 

When I perform a cfstats on the node immediately after writing the data, it 
reports that the Compacted row minimum size = Compacted row maximum size which 
is a little over 100,000 bytes.  This is what I expect.  

I then run an application that randomly reads rows and adds a timestamp column 
to each row read.  This timestamp column name and column value is just adding 
a few bytes to the row.

But after running my reading app for a few hours, cfstats reports a very odd 
minimum row size (and compacted mean row size):

[root@ec2-server1 ~]# /mnt/server/apache-cassandra-0.6.2/bin/nodetool -h 
ec2-server1 -p 8080 cfstats
Keyspace: Keyspace1
	Read Count: 670434
	Read Latency: 36.22349047035205 ms.
	Write Count: 1519933
	Write Latency: 0.02940705741634664 ms.
	Pending Tasks: 0
		Column Family: Standard1
		SSTable count: 6
		Space used (live): 11130225642
		Space used (total): 11130225642
		Memtable Columns Count: 1435
		Memtable Data Size: 40344907
		Memtable Switch Count: 1329
		Read Count: 670434
		Read Latency: 41.768 ms.
		Write Count: 1519933
		Write Latency: 0.025 ms.
		Pending Tasks: 0
		Key cache capacity: 200000
		Key cache size: 200000
		Key cache hit rate: 0.48049934471509675
		Row cache: disabled
		Compacted row minimum size: 238
		Compacted row maximum size: 100323
		Compacted row mean size: 67548

I thought I had a bug in my code so I wrote another app to read every row 
in the database, keys 1-10,000.  I get the size of each row after reading it 
(by adding up all column names and column values in the row and the size of 
the key string) and this matches what I expect -- every single key in this 
table has a size of just over 100,000 bytes.  (I know there are some 
overhead columns in each row but I assume these will only make the row 
larger, not smaller.)

So I am confused about where cfstats is getting the row sizes it is working 
with?  

When I add the timestamp column to each row, I am not deleting the other 
column (large) in the row but I am not rewriting the large column either.

Thanks for your help!
Julie



Mime
View raw message