Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9B9E9933C for ; Wed, 28 Mar 2012 20:41:37 +0000 (UTC) Received: (qmail 48571 invoked by uid 500); 28 Mar 2012 20:41:35 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 48558 invoked by uid 500); 28 Mar 2012 20:41:35 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 48550 invoked by uid 99); 28 Mar 2012 20:41:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Mar 2012 20:41:35 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yiming.sun@gmail.com designates 209.85.212.178 as permitted sender) Received: from [209.85.212.178] (HELO mail-wi0-f178.google.com) (209.85.212.178) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Mar 2012 20:41:28 +0000 Received: by wibhq7 with SMTP id hq7so1095316wib.7 for ; Wed, 28 Mar 2012 13:41:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=X070Q0WrVhctu202VVMzVFIFqh1EXzSftdZtD3cTRoc=; b=W8SR/74ByvqKtLAIocL4C7jpzWWQAs9k3kqoZ6CXdIYXlSZvsqIFf16P7Ae/GkO1WM GQ2LP+g/X03j57PtcFRm9X1QxXLrodHxdWVFT3xP+Km6rWOmXfjxryhPrUNhR6E+r0YG Mvw0C5/kg6xsEFQF1ZeUY4nmccJql8B2MdPKQPzwH68Mx5l7+qoCjLb5ZtnAmG+NY4bB ATp3usC1uTiExQ3wWAPApzXfpxwRp7RgI/KZqctT34nDi0esUpvpiWmuLGge3LLHlBag OnRz5G12Yw4CSpWmz6cvi48Iacq2X5fP5PzcetyStsDUxismGPboIbRvLOF+qk58jnx4 IBmg== Received: by 10.180.103.134 with SMTP id fw6mr905037wib.0.1332967267471; Wed, 28 Mar 2012 13:41:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.17.207 with HTTP; Wed, 28 Mar 2012 13:40:47 -0700 (PDT) From: Yiming Sun Date: Wed, 28 Mar 2012 16:40:47 -0400 Message-ID: Subject: data size difference between supercolumn and regular column To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d044304427991bf04bc53a083 X-Virus-Checked: Checked by ClamAV on apache.org --f46d044304427991bf04bc53a083 Content-Type: text/plain; charset=ISO-8859-1 Hi, We are trying to estimate the amount of storage we need for a production cassandra cluster. While I was doing the calculation, I noticed a very dramatic difference in terms of storage space used by cassandra data files. Our previous setup consists of a single-node cassandra 0.8.x with no replication, and the data is stored using supercolumns, and the data files total about 534GB on disk. A few weeks ago, I put together a cluster consisting of 3 nodes running cassandra 1.0 with replication factor of 2, and the data is flattened out and stored using regular columns. And the aggregated data file size is only 488GB (would be 244GB if no replication). This is a very dramatic reduction in terms of storage needs, and is certainly good news in terms of how much storage we need to provision. However, because of the dramatic reduction, I also would like to make sure it is absolutely correct before submitting it - and also get a sense of why there was such a difference. -- I know cassandra 1.0 does data compression, but does the schema change from supercolumn to regular column also help reduce storage usage? Thanks. -- Y. --f46d044304427991bf04bc53a083 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi,

We are trying to estimate the amount of storage we n= eed for a production cassandra cluster. =A0While I was doing the calculatio= n, I noticed a very dramatic difference in terms of storage space used by c= assandra data files.

Our previous setup consists of a single-node cassandra = 0.8.x with no replication, and the data is stored using supercolumns, and t= he data files total about 534GB on disk.

A few wee= ks ago, I put together a cluster consisting of 3 nodes running cassandra 1.= 0 with replication factor of 2, and the data is flattened out and stored us= ing regular columns. =A0And the aggregated data file size is only 488GB (wo= uld be 244GB if no replication).

This is a very dramatic reduction in terms of storage n= eeds, and is certainly good news in terms of how much storage we need to pr= ovision. =A0However, because of the dramatic reduction, I also would like t= o make sure it is absolutely correct before submitting it - and also get a = sense of why there was such a difference. -- I know cassandra 1.0 does data= compression, but does the schema change from supercolumn to regular column= also help reduce storage usage? =A0Thanks.

-- Y.
--f46d044304427991bf04bc53a083--