Return-Path: Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: (qmail 33637 invoked from network); 30 Aug 2010 16:22:17 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 30 Aug 2010 16:22:17 -0000 Received: (qmail 13884 invoked by uid 500); 30 Aug 2010 16:22:16 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 13575 invoked by uid 500); 30 Aug 2010 16:22:15 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 13449 invoked by uid 99); 30 Aug 2010 16:22:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 16:22:15 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stuhood@mailtrust.com designates 207.97.245.111 as permitted sender) Received: from [207.97.245.111] (HELO smtp111.iad.emailsrvr.com) (207.97.245.111) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 30 Aug 2010 16:22:09 +0000 Received: from relay21.relay.iad.mlsrvr.com (localhost [127.0.0.1]) by relay21.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id 89A9B1B4006 for ; Mon, 30 Aug 2010 12:21:48 -0400 (EDT) Received: from dynamic6.wm-web.iad.mlsrvr.com (dynamic6.wm-web.iad.mlsrvr.com [192.168.2.147]) by relay21.relay.iad.mlsrvr.com (SMTP Server) with ESMTP id 6215D1B400C for ; Mon, 30 Aug 2010 12:21:48 -0400 (EDT) Received: from mailtrust.com (localhost [127.0.0.1]) by dynamic6.wm-web.iad.mlsrvr.com (Postfix) with ESMTP id 4B94F3F0054 for ; Mon, 30 Aug 2010 12:21:48 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: stuhood@mailtrust.com, from: stu.hood@rackspace.com) with HTTP; Mon, 30 Aug 2010 11:21:48 -0500 (CDT) Date: Mon, 30 Aug 2010 11:21:48 -0500 (CDT) Subject: RE: cassandra disk usage From: "Stu Hood" To: dev@cassandra.apache.org MIME-Version: 1.0 Content-Type: text/plain;charset=UTF-8 Content-Transfer-Encoding: quoted-printable Importance: Normal X-Priority: 3 (Normal) X-Type: plain In-Reply-To: References: Message-ID: <1283185308.306416713@192.168.2.231> X-Mailer: webmail8 Also, see: https://issues.apache.org/jira/browse/CASSANDRA-1207=0A=0A-----O= riginal Message-----=0AFrom: "Terje Marthinussen" = =0ASent: Monday, August 30, 2010 6:58am=0ATo: dev@cassandra.apache.org=0ASu= bject: cassandra disk usage=0A=0AHi,=0A=0AWas just looking at a SSTable fil= e after loading a dataset. The data load=0Ahas no updates of data but:=0A-= Columns can in some rare cases be added to existing super columns=0A- Supe= rColumns will be added to the same key (but not overwriting existing=0Adata= ). I batch these, but it is quite likely that there will be 2-3 updates=0At= o a key.=0A=0AThis is a random selected SSTable file from a much bigger dat= aset.=0A=0AThe data is stored as date(super)/type(column)/value=0ADate is a= simple "20100811" type string.=0AValue is a small integer, 2 digit on aver= age=0A=0AIf I run a simple strings on the SSTable and look for the data:=0A= value: 692Kbyte of data=0Atype: 4.01MByte of data=0Adate: 4.6MB of data=0A= =0AIn total: 9.4MByte=0A=0AThe size of the .db file however, is 36.4MB...= =0A=0AThe expansion from the column headers are bad enough, but I can someh= ow=0Aaccept that.=0AThe almost 4x expansion on top of that is a bit harder = to justify...=0A=0AAnyone know already where this expansion comes from? Or = I need to take a=0Acareful look at source (probably useful anyway :))=0A=0A= Terje=0A