cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: cassandra disk usage
Date Mon, 30 Aug 2010 13:10:10 GMT
column names are stored per cell

(moving to user@)

On Mon, Aug 30, 2010 at 6:58 AM, Terje Marthinussen
<tmarthinussen@gmail.com> wrote:
> Hi,
>
> Was just looking at a SSTable file after loading a dataset. The data load
> has no updates of data  but:
> - Columns can in some rare cases be added to existing super columns
> - SuperColumns will be added to the same key (but not overwriting existing
> data). I batch these, but it is quite likely that there will be 2-3 updates
> to a key.
>
> This is a random selected SSTable file from a much bigger dataset.
>
> The data is stored as date(super)/type(column)/value
> Date is a simple "20100811" type string.
> Value is a small integer, 2 digit on average
>
> If I run a simple strings on the SSTable and look for the data:
> value: 692Kbyte of data
> type: 4.01MByte of data
> date: 4.6MB of data
>
> In total: 9.4MByte
>
> The size of the .db file however, is 36.4MB...
>
> The expansion from the column headers are bad enough, but I can somehow
> accept that.
> The almost 4x expansion on top of that is a bit harder to justify...
>
> Anyone know already where this expansion comes from? Or I need to take a
> careful look at source (probably useful anyway :))
>
> Terje
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Mime
View raw message