Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 47847 invoked from network); 7 Jul 2010 16:12:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 7 Jul 2010 16:12:43 -0000 Received: (qmail 99754 invoked by uid 500); 7 Jul 2010 16:12:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 99728 invoked by uid 500); 7 Jul 2010 16:12:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 99714 invoked by uid 99); 7 Jul 2010 16:12:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jul 2010 16:12:42 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mason@onespot.com designates 74.125.83.44 as permitted sender) Received: from [74.125.83.44] (HELO mail-gw0-f44.google.com) (74.125.83.44) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Jul 2010 16:12:34 +0000 Received: by gwb10 with SMTP id 10so4083451gwb.31 for ; Wed, 07 Jul 2010 09:11:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.90.114.11 with SMTP id m11mr6306767agc.199.1278519073512; Wed, 07 Jul 2010 09:11:13 -0700 (PDT) Received: by 10.90.81.4 with HTTP; Wed, 7 Jul 2010 09:11:13 -0700 (PDT) In-Reply-To: References: Date: Wed, 7 Jul 2010 11:11:13 -0500 Message-ID: Subject: Re: Cassandra disk space utilization From: Mason Hale To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=0016361e84ac371cc1048ace6be2 X-Virus-Checked: Checked by ClamAV on apache.org --0016361e84ac371cc1048ace6be2 Content-Type: text/plain; charset=ISO-8859-1 Hi Julie -- Keep in mind that there is additional data storage overhead, including timestamps and column names. Because the schema can vary from row to row, the column names are stored with each row, in addition to the data. Disk space-efficiency is not a primary design goal for Cassandra. Mason On Wed, Jul 7, 2010 at 10:56 AM, Julie wrote: > Hi guys, > I have what may be a dumb question but I am confused by how much disk space > is > being used by my Cassandra nodes. I have 10 nodes in my cluster with a > replication factor of 3. After I write 1,000,000 rows to the database > (100kB > each), I see that they have been distributed very evenly, about 100,000 > rows > per node but because of the replication factor of 3, each node contains > about > 300,000 rows. This is all good. Since my rows are 100kB each, I expect > each > node to store about 30GB of data, however that is not what I am seeing. > Instead, I am seeing some nodes that do not experience any compaction > exceptions but report their space used as MUCH more. Here's one using 106 > GB > of disk. My disks are only 160 GB so this is at the bleeding edge and I > thought my node would be able to store more data. > > I only use a single column family so here is the cfstats output from one of > my > nodes (server5): > > Column Family: Standard1 > SSTable count: 12 > Space used (live): 113946099884 > Space used (total): 113946099884 > Memtable Columns Count: 0 > Memtable Data Size: 0 > Memtable Switch Count: 451 > Read Count: 31786 > Read Latency: 161.429 ms. > Write Count: 300633 > Write Latency: 0.124 ms. > Pending Tasks: 0 > Key cache: disabled > Row cache capacity: 3000 > Row cache size: 3000 > Row cache hit rate: 0.38331340841880074 > Compacted row minimum size: 100220 > Compacted row maximum size: 100225 > Compacted row mean size: 100224 > > Note that I wrote these 1M rows of data yesterday and the system has had 24 > hours to digest it. There are no exceptions in the system.log file. Here's > the tail end of it: > > ... > INFO [SSTABLE-CLEANUP-TIMER] 2010-07-06 16:13:43,162 > SSTableDeletingReference.java (line 104) Deleted > /var/lib/cassandra/data/Keyspace1/Standard1-430-Data.db > INFO [SSTABLE-CLEANUP-TIMER] 2010-07-06 16:13:43,269 > SSTableDeletingReference.java (line 104) Deleted > /var/lib/cassandra/data/Keyspace1/Standard1-445-Data.db > INFO [COMPACTION-POOL:1] 2010-07-06 16:35:21,718 CompactionManager.java > (line > 246) Compacting [] > INFO [Timer-1] 2010-07-06 17:01:01,907 Gossiper.java (line 179) > InetAddress > /10.248.107.19 is now dead. > INFO [GMFD:1] 2010-07-06 17:01:42,039 Gossiper.java (line 568) InetAddress > /10.248.107.19 is now UP > INFO [COMPACTION-POOL:1] 2010-07-06 17:35:21,306 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 18:35:20,802 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 19:35:20,389 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 20:35:19,934 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 21:35:19,582 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 22:35:19,233 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-06 23:35:18,593 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 00:35:18,076 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 01:35:17,673 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 02:35:17,172 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 03:35:16,784 CompactionManager.java > (line > 246) Compacting [] > INFO [COMPACTION-POOL:1] 2010-07-07 04:35:16,383 CompactionManager.java > (line > 246) Compacting [] > > Thank you for your help!! > Julie > > > --0016361e84ac371cc1048ace6be2 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Julie --

Keep in mind that there is additional data s= torage overhead, including timestamps and column names. Because the schema = can vary from row to row, the column names are stored with each row, in add= ition to the data. Disk space-efficiency is not a primary design goal for C= assandra.=A0

Mason



On Wed, Jul 7, 2010 at 10:56 AM, Julie <= span dir=3D"ltr"><julie.s= ugar@nextcentury.com> wrote:
Hi guys,
I have what may be a dumb question but I am confused by how much disk space= is
being used by my Cassandra nodes. =A0I have 10 nodes in my cluster with a replication factor of 3. =A0After I write 1,000,000 rows to the database (1= 00kB
each), I see that they have been distributed very evenly, about 100,000 row= s
per node but because of the replication factor of 3, each node contains abo= ut
300,000 rows. =A0This is all good. =A0Since my rows are 100kB each, I expec= t each
node to store about 30GB of data, however that is not what I am seeing.
Instead, I am seeing some nodes that do not experience any compaction
exceptions but report their space used as MUCH more. =A0Here's one usin= g 106 GB
of disk. =A0My disks are only 160 GB so this is at the bleeding edge and I<= br> thought my node would be able to store more data.

I only use a single column family so here is the cfstats output from one of= my
nodes (server5):

=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Column Family: Standard1
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0SSTable count: 12
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Space used (live): 113946099884
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Space used (total): 113946099884
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Memtable Columns Count: 0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Memtable Data Size: 0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Memtable Switch Count: 451
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Read Count: 31786
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Read Latency: 161.429 ms.
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Write Count: 300633
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Write Latency: 0.124 ms.
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Pending Tasks: 0
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Key cache: disabled
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Row cache capacity: 3000
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Row cache size: 3000
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Row cache hit rate: 0.38331340841880074
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Compacted row minimum size: 100220
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Compacted row maximum size: 100225
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Compacted row mean size: 100224

Note that I wrote these 1M rows of data yesterday and the system has had 24=
hours to digest it. There are no exceptions in the system.log file. =A0Here= 's
the tail end of it:

...
INFO [SSTABLE-CLEANUP-TIMER] 2010-07-06 16:13:43,162
SSTableDeletingReference.java (line 104) Deleted
/var/lib/cassandra/data/Keyspace1/Standard1-430-Data.db
=A0INFO [SSTABLE-CLEANUP-TIMER] 2010-07-06 16:13:43,269
SSTableDeletingReference.java (line 104) Deleted
/var/lib/cassandra/data/Keyspace1/Standard1-445-Data.db
=A0INFO [COMPACTION-POOL:1] 2010-07-06 16:35:21,718 CompactionManager.java = (line
246) Compacting []
=A0INFO [Timer-1] 2010-07-06 17:01:01,907 Gossiper.java (line 179) InetAddr= ess
/10.248.107.19 is no= w dead.
=A0INFO [GMFD:1] 2010-07-06 17:01:42,039 Gossiper.java (line 568) InetAddre= ss
/10.248.107.19 is no= w UP
=A0INFO [COMPACTION-POOL:1] 2010-07-06 17:35:21,306 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-06 18:35:20,802 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-06 19:35:20,389 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-06 20:35:19,934 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-06 21:35:19,582 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-06 22:35:19,233 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-06 23:35:18,593 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-07 00:35:18,076 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-07 01:35:17,673 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-07 02:35:17,172 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-07 03:35:16,784 CompactionManager.java = (line
246) Compacting []
=A0INFO [COMPACTION-POOL:1] 2010-07-07 04:35:16,383 CompactionManager.java = (line
246) Compacting []

Thank you for your help!!
Julie



--0016361e84ac371cc1048ace6be2--