incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexis Rodríguez <>
Subject [howto measure disk usage]
Date Fri, 13 May 2011 15:15:04 GMT

I'm trying to measure disk usage by cassandra after inserting some columns
in order to plan disk sizes and configurations for future deploys.

My approach is very straightforward:

clean_data (stop_cassandra && rm -rf
measure_disk_usage (nodetool -flush && du -ch /var/lib/cassandra)

There are two types of inserts:

   - In a simple column with key, name and value a random string of size 100
   - In a super-column with key, super-column-name, name and value a random
   string of size 100

But surprisingly when I'm inserting 100 million columns on a simple column
it uses more disk than the same amount in a super-column. How can that be

For simple column 41984 MB and for super-column 29696, the difference is
more than noticeable!

Somebody told me yesterday that super-columns don't have a per-column
timestamp, but... it in my case, even if every data was in the same
super-column-key it will not explain the difference!

ps: sorry, English is not my first language

View raw message