cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Yang <>
Subject Re: Cassandra adding 500K + Super Column Family
Date Tue, 16 Aug 2011 23:32:11 GMT
Sounds like it's a similar case as mine.   The files are definitely, extremely big, 10x space
overhead should be a good case if you are just putting values into it.

I'm currently testing CASSANDRA-674 and hopes the better SSTable can solve the space overhead
problem.   Please follow my e-mail today and I'll continuously work on it today.

If your values are integer and floats, with column name containing ~4 characters, as estimated
from my case it will cost you 1~2TB of disk space.


On Aug 16, 2011, at 4:20 PM, aaron morton wrote:

> Are you planning to create 500,000 Super Column Families or 500,000 rows in a single
Super Column Family ? 
> The former is a somewhat crazy. Cassandra schemas typically have up to a few tens of
Column Families. Each column family involves a certain amount of memory overhead, this is
now automatically managed in Cassandra 0.8 (see
> if I understand correctly you have 500K entities with 6k columns each. A simple first
approach to modelling this would be to use a Standard CF with a row for each entity. However
the best model is the one that serves your read requests best. 
> Also for background the sub columns in a super column are not indexed see
. You would probably run into this problem if you had 6000 sub columns in a super column.

> Hope that helps. 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> On 17/08/2011, at 12:53 AM, Renato Bacelar da Silveira wrote:
>> I am wondering about a certain volume situation.
>> I currently load a Keyspace with a certain amount of SCFs.
>> Each SCF (Super Column Family) represents an entity.
>> Each Entity may have up to 6000 values.
>> I am planning to have 500,000 Entities (SCF) with
>> 6000 Columns (within Super Columns - number of Super Columns
>> unknown), and was wondering how much resources something
>> like this would require?
>> I am struggling to have 10,000 SCF with 30 Columns (within SuperColumns),
>> I get very large files, and reach a 4Gb heapspace limit very quickly on
>> a single node. I use Garbage Collection where needed.
>> Is there some secret to load 500,000 Super Column Families?
>> Regards.
>> -- 
>> Renato da Silveira
>> Senior Developer

View raw message