cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Doubleday <daniel.double...@gmx.net>
Subject Re: Database grows 10X bigger after running nodetool repair
Date Wed, 25 May 2011 16:16:03 GMT
We are having problems with repair too. 

It sounds like yours are the same. From today:
http://permalink.gmane.org/gmane.comp.db.cassandra.user/16619

On May 25, 2011, at 4:52 PM, Dominic Williams wrote:

> Hi,
> 
> I've got a strange problem, where the database on a node has inflated 10X after running
repair. This is not the result of receiving missed data.
> 
> I didn't perform repair within my usual 10 day cycle, so followed recommended practice:
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
> 
> The sequence of events was like this:
> 
> 1) set GCGraceSeconds to some huge value
> 2) perform rolling upgrade from 0.7.4 to 0.7.6-2
> 3) run nodetool repair on the first node in cluster ~10pm. It has a ~30G database
> 3) 2.30am decide to leave it running all night and wake up 9am to find still running
> 4) late morning investigation shows that db size has increased to 370G. The snapshot
folder accounts for only 30G
> 5) node starts to run out of disk space http://pastebin.com/Sm0B7nfR
> 6) decide to bail! Reset GCGraceSeconds to 864000 and restart node to stop repair
> 7) as node restarts it deletes a bunch of tmp files, reducing db size from 370G to 270G
> 8) node now constantly performing minor compactions and du rising slightly then falling
by a greater amount after minor compaction deletes sstable
> 9) gradually disk usage is coming down. Currently at 254G (3pm)
> 10) performance of node obviously not great!
> 
> Investigation of the database reveals the main problem to have occurred in a single column
family, UserFights. This contains millions of fight records from our MMO, but actually exactly
the same number as the MonsterFights cf. However, the comparative size is
> 
> 		Column Family: MonsterFights
> 		SSTable count: 38
> 		Space used (live): 13867454647
> 		Space used (total): 13867454647 (13G)
> 		Memtable Columns Count: 516
> 		Memtable Data Size: 598770
> 		Memtable Switch Count: 4
> 		Read Count: 514
> 		Read Latency: 157.649 ms.
> 		Write Count: 4059
> 		Write Latency: 0.025 ms.
> 		Pending Tasks: 0
> 		Key cache capacity: 200000
> 		Key cache size: 183004
> 		Key cache hit rate: 0.0023566218452145135
> 		Row cache: disabled
> 		Compacted row minimum size: 771
> 		Compacted row maximum size: 943127
> 		Compacted row mean size: 3208
> 
> 		Column Family: UserFights
> 		SSTable count: 549
> 		Space used (live): 185355019679
> 		Space used (total): 219489031691 (219G)
> 		Memtable Columns Count: 483
> 		Memtable Data Size: 560569
> 		Memtable Switch Count: 8
> 		Read Count: 2159
> 		Read Latency: 2589.150 ms.
> 		Write Count: 4080
> 		Write Latency: 0.018 ms.
> 		Pending Tasks: 0
> 		Key cache capacity: 200000
> 		Key cache size: 200000
> 		Key cache hit rate: 0.03357770764288416
> 		Row cache: disabled
> 		Compacted row minimum size: 925
> 		Compacted row maximum size: 12108970
> 		Compacted row mean size: 503069
> 
> These stats were taken at 3pm, and at 1pm UserFights was using 224G total, so overall
size is gradually coming down. 
> 
> Another observation is the following appearing in the logs during the minor compactions:
> Compacting large row 536c69636b5061756c (121235810 bytes) incrementally
> 
> The largest number of fights any user has performed on our MMO that I can find is short
of 10,000. Each fight record is smaller than 1K... so it looks like these rows have grown
+10X somehow.
> 
> The size of UserFights on another replica node, which actually has a slightly higher
proportion of ring is
> 
> 		Column Family: UserFights
> 		SSTable count: 14
> 		Space used (live): 17844982744
> 		Space used (total): 17936528583 (18G)
> 		Memtable Columns Count: 767
> 		Memtable Data Size: 891153
> 		Memtable Switch Count: 6
> 		Read Count: 2298
> 		Read Latency: 61.020 ms.
> 		Write Count: 4261
> 		Write Latency: 0.104 ms.
> 		Pending Tasks: 0
> 		Key cache capacity: 200000
> 		Key cache size: 55172
> 		Key cache hit rate: 0.8079570484581498
> 		Row cache: disabled
> 		Compacted row minimum size: 925
> 		Compacted row maximum size: 12108970
> 		Compacted row mean size: 846477
> ...
> 
> All ideas and suggestions greatly appreciated as always!
> 
> Dominic
> ria101.wordpress.com


Mime
View raw message