incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Važan <>
Subject Re: Minimum row size / minimum data point size
Date Fri, 04 Oct 2013 07:54:21 GMT
That spreadsheet doesn't take compression into account, which is very 
important in my case. Uncompressed, my data is going to require a 
petabyte of storage according to the spreadsheet. I am pretty sure I 
won't get that much storage to play with.

The spreadsheet also shows that Cassandra wastes unbelievable amount of 
space on compaction. My experiments with LevelDB however show that it is 
possible for write-optimized database to use negligible compaction 
space. I am not sure how LevelDB does it. I guess it splits the larger 
sstables into smaller chunks and merges them incrementally.

Anyway, does anybody know how densely can I store the data with 
Cassandra when compression is enabled? Would I have to implement some 
smart adaptive grouping to fit lots of records in one row or is there a 
simpler solution?

Dňa 4. 10. 2013 1:56 Andrey Ilinykh wrote / napísal(a):
> It may help.
> On Thu, Oct 3, 2013 at 1:31 PM, Robert Važan < 
> <>> wrote:
>     I need to store one trillion data points. The data is highly
>     compressible down to 1 byte per data point using simple custom
>     compression combined with standard dictionary compression. What's
>     the most space-efficient way to store the data in Cassandra? How
>     much per-row overhead is there if I store one data point per row?
>     The data is particularly hard to group. It's a large number of
>     time series with highly variable density. That makes it hard to
>     pack subsets of the data into meaningful column families / wide
>     rows. Is there a table layout scheme that would allow me to
>     approach the 1B per data point without forcing me to implement
>     complex abstraction layer on application level?

View raw message