Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 59345 invoked from network); 30 Aug 2010 13:11:07 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Aug 2010 13:11:07 -0000 Received: (qmail 23836 invoked by uid 500); 30 Aug 2010 13:11:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 22975 invoked by uid 500); 30 Aug 2010 13:11:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 22943 invoked by uid 99); 30 Aug 2010 13:11:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 13:11:00 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jbellis@gmail.com designates 209.85.212.44 as permitted sender) Received: from [209.85.212.44] (HELO mail-vw0-f44.google.com) (209.85.212.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 13:10:55 +0000 Received: by vws10 with SMTP id 10so5485918vws.31 for ; Mon, 30 Aug 2010 06:10:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=sfvc+e8VXM0QG/XuE5axKK8BrGFLOUubVYvEJRvkF9U=; b=d12dDethYWyXm3qNnJDxmzOHv5zz3FNxvkxWOqHo1Eoc1cyyIgKro+vcNUgn/ZDenf bL6+jnLsbMK1ne9oh+ka2o1Dd+dUVug8U2qYnvL9AI/TSp2c4cEFdROYvpy0OBXZo1k9 MMPhg3AtODYTN9Zzzv8F8RIWrRklgW1/UQwX0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=MHBZLmjznKaEaRYXj/wlkFbt3wR24lveNlD/7TAkDuhRlUFPBSaQSZYs/+8vSe4wVo iUI43x/JargOvkWmMrVr51FvZcxvhYRe7RpfhXkInHM5VP4odr9kwzcJMI612PDWqSWq vOmT6x9A6HjuZc1I2Wcx1Kt+HpGpKceUB51fw= Received: by 10.220.122.31 with SMTP id j31mr2776719vcr.271.1283173832339; Mon, 30 Aug 2010 06:10:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.201.129 with HTTP; Mon, 30 Aug 2010 06:10:10 -0700 (PDT) In-Reply-To: References: From: Jonathan Ellis Date: Mon, 30 Aug 2010 08:10:10 -0500 Message-ID: Subject: Re: cassandra disk usage To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable column names are stored per cell (moving to user@) On Mon, Aug 30, 2010 at 6:58 AM, Terje Marthinussen wrote: > Hi, > > Was just looking at a SSTable file after loading a dataset. The data load > has no updates of data =A0but: > - Columns can in some rare cases be added to existing super columns > - SuperColumns will be added to the same key (but not overwriting existin= g > data). I batch these, but it is quite likely that there will be 2-3 updat= es > to a key. > > This is a random selected SSTable file from a much bigger dataset. > > The data is stored as date(super)/type(column)/value > Date is a simple "20100811" type string. > Value is a small integer, 2 digit on average > > If I run a simple strings on the SSTable and look for the data: > value: 692Kbyte of data > type: 4.01MByte of data > date: 4.6MB of data > > In total: 9.4MByte > > The size of the .db file however, is 36.4MB... > > The expansion from the column headers are bad enough, but I can somehow > accept that. > The almost 4x expansion on top of that is a bit harder to justify... > > Anyone know already where this expansion comes from? Or I need to take a > careful look at source (probably useful anyway :)) > > Terje > --=20 Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com