I don't compare it with PosgreSQL size, I just make some estimations.. This table / column family stores some xml documents with average raw size of 2Mb each and total size about 5Gb. However the space cassandra occupies on disc is 70Gb (after gc_grace was set to 0 and major compaction was run).
Maybe there is some tool to analyze it? It would be great if I could somehow export each row of a column family into a separate file - so I could see their count and sizes. Is there any such tool? Or maybe you have some better thoughts...
2012/9/3 Peter Schuller <email@example.com>
> I think that was clear from your post. I don't see a problem with your(But may be unsafe as documented; can cause deleted data to pop back up, etc.)
> process. Setting gc grace to 0 and forcing compaction should indeed
> return you to the smallest possible on-disk size.