From user-return-25455-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Wed Apr 11 13:03:36 2012 Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7E16990A2 for ; Wed, 11 Apr 2012 13:03:36 +0000 (UTC) Received: (qmail 57614 invoked by uid 500); 11 Apr 2012 13:03:34 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 57592 invoked by uid 500); 11 Apr 2012 13:03:34 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 57584 invoked by uid 99); 11 Apr 2012 13:03:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Apr 2012 13:03:34 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [72.35.23.46] (HELO smtp-out2.electric.net) (72.35.23.46) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Apr 2012 13:03:28 +0000 Received: from [10.86.5.46] (helo=fuse246) by cernan.electric.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.76) (envelope-from ) id 1SHxCU-0004bw-VH for user@cassandra.apache.org; Wed, 11 Apr 2012 06:03:06 -0700 Received: from mailanyone.net by fuse246 with esmtpsa (TLSv1:AES256-SHA:256) (MailAnyone extSMTP dbrosius@baybroadband.net) id 1SHxCU-0007ys-Dq for user@cassandra.apache.org; Wed, 11 Apr 2012 06:03:06 -0700 Message-ID: <4F85808B.7090501@mebigfatguy.com> Date: Wed, 11 Apr 2012 09:00:59 -0400 From: Dave Brosius User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Why so many SSTables? References: In-Reply-To: Content-Type: multipart/alternative; boundary="------------080304080609020409090904" X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------080304080609020409090904 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit It's easy to spend other people's money, but handling 1TB of data with 1.5 g heap? Memory is cheap, and just a little more will solve many problems. On 04/11/2012 08:43 AM, Romain HARDOUIN wrote: > > Thank you for your answers. > > I originally post this question because we encoutered an OOM Exception > on 2 nodes during repair session. > Memory analyzing shows an hotspot: an ArrayList of > SSTableBoundedScanner which contains as many objects there are > SSTables on disk (7747 objects at the time). > This ArrayList consumes 47% of the heap space (786 MB). > > We want each node to handle 1 TB, so we must dramatically reduce the > number of SSTables. > > Thus, is there any drawback if we set sstable_size_in_mb to 200MB? > Otherwise shoudl we go back to Tiered Compaction? > > Regards, > > Romain > > > Maki Watanabe a écrit sur 11/04/2012 04:21:47 : > > > You can configure sstable size by sstable_size_in_mb parameter for LCS. > > The default value is 5MB. > > You should better to check you don't have many pending compaction tasks > > with nodetool tpstats and compactionstats also. > > If you have enough IO throughput, you can increase > > compaction_throughput_mb_per_sec > > in cassandra.yaml to reduce pending compactions. > > > > maki > > > > 2012/4/10 Romain HARDOUIN : > > > > > > Hi, > > > > > > We are surprised by the number of files generated by Cassandra. > > > Our cluster consists of 9 nodes and each node handles about 35 GB. > > > We're using Cassandra 1.0.6 with LeveledCompactionStrategy. > > > We have 30 CF. > > > > > > We've got roughly 45,000 files under the keyspace directory on > each node: > > > ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l > > > 44372 > > > > > > The biggest CF is spread over 38,000 files: > > > ls -l Documents* | wc -l > > > 37870 > > > > > > ls -l Documents*-Data.db | wc -l > > > 7586 > > > > > > Many SSTable are about 4 MB: > > > > > > 19 MB -> 1 SSTable > > > 12 MB -> 2 SSTables > > > 11 MB -> 2 SSTables > > > 9.2 MB -> 1 SSTable > > > 7.0 MB to 7.9 MB -> 6 SSTables > > > 6.0 MB to 6.4 MB -> 6 SSTables > > > 5.0 MB to 5.4 MB -> 4 SSTables > > > 4.0 MB to 4.7 MB -> 7139 SSTables > > > 3.0 MB to 3.9 MB -> 258 SSTables > > > 2.0 MB to 2.9 MB -> 35 SSTables > > > 1.0 MB to 1.9 MB -> 13 SSTables > > > 87 KB to 994 KB -> 87 SSTables > > > 0 KB -> 32 SSTables > > > > > > FYI here is CF information: > > > > > > ColumnFamily: Documents > > > Key Validation Class: org.apache.cassandra.db.marshal.BytesType > > > Default column value validator: > org.apache.cassandra.db.marshal.BytesType > > > Columns sorted by: org.apache.cassandra.db.marshal.BytesType > > > Row cache size / save period in seconds / keys to save : 0.0/0/all > > > Row Cache Provider: > org.apache.cassandra.cache.SerializingCacheProvider > > > Key cache size / save period in seconds: 200000.0/14400 > > > GC grace seconds: 1728000 > > > Compaction min/max thresholds: 4/32 > > > Read repair chance: 1.0 > > > Replicate on write: true > > > Column Metadata: > > > Column Name: refUUID (72656655554944) > > > Validation Class: org.apache.cassandra.db.marshal.BytesType > > > Index Name: refUUID_idx > > > Index Type: KEYS > > > Compaction Strategy: > > > org.apache.cassandra.db.compaction.LeveledCompactionStrategy > > > Compression Options: > > > sstable_compression: > org.apache.cassandra.io.compress.SnappyCompressor > > > > > > Is it a bug? If not, how can we tune Cassandra to avoid this? > > > > > > Regards, > > > > > > Romain --------------080304080609020409090904 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit It's easy to spend other people's money, but handling 1TB of data with 1.5 g heap?  Memory is cheap, and just a little more will solve many problems.


On 04/11/2012 08:43 AM, Romain HARDOUIN wrote:

Thank you for your answers.

I originally post this question because we encoutered an OOM Exception on 2 nodes during repair session.
Memory analyzing shows an hotspot: an ArrayList of SSTableBoundedScanner which contains as many objects there are SSTables on disk (7747 objects at the time).
This ArrayList consumes 47% of the heap space (786 MB).

We want each node to handle 1 TB, so we must dramatically reduce the number of SSTables.

Thus, is there any drawback if we set sstable_size_in_mb to 200MB?
Otherwise shoudl we go back to Tiered Compaction?

Regards,

Romain


Maki Watanabe <watanabe.maki@gmail.com> a écrit sur 11/04/2012 04:21:47 :

> You can configure sstable size by sstable_size_in_mb parameter for LCS.
> The default value is 5MB.
> You should better to check you don't have many pending compaction tasks
> with nodetool tpstats and compactionstats also.
> If you have enough IO throughput, you can increase
> compaction_throughput_mb_per_sec
> in cassandra.yaml to reduce pending compactions.
>
> maki
>
> 2012/4/10 Romain HARDOUIN <romain.hardouin@urssaf.fr>:
> >
> > Hi,
> >
> > We are surprised by the number of files generated by Cassandra.
> > Our cluster consists of 9 nodes and each node handles about 35 GB.
> > We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
> > We have 30 CF.
> >
> > We've got roughly 45,000 files under the keyspace directory on each node:
> > ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
> > 44372
> >
> > The biggest CF is spread over 38,000 files:
> > ls -l Documents* | wc -l
> > 37870
> >
> > ls -l Documents*-Data.db | wc -l
> > 7586
> >
> > Many SSTable are about 4 MB:
> >
> > 19 MB -> 1 SSTable
> > 12 MB -> 2 SSTables
> > 11 MB -> 2 SSTables
> > 9.2 MB -> 1 SSTable
> > 7.0 MB to 7.9 MB -> 6 SSTables
> > 6.0 MB to 6.4 MB -> 6 SSTables
> > 5.0 MB to 5.4 MB -> 4 SSTables
> > 4.0 MB to 4.7 MB -> 7139 SSTables
> > 3.0 MB to 3.9 MB -> 258 SSTables
> > 2.0 MB to 2.9 MB -> 35 SSTables
> > 1.0 MB to 1.9 MB -> 13 SSTables
> > 87 KB to  994 KB -> 87 SSTables
> > 0 KB -> 32 SSTables
> >
> > FYI here is CF information:
> >
> > ColumnFamily: Documents
> >   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
> >   Default column value validator: org.apache.cassandra.db.marshal.BytesType
> >   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
> >   Row cache size / save period in seconds / keys to save : 0.0/0/all
> >   Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
> >   Key cache size / save period in seconds: 200000.0/14400
> >   GC grace seconds: 1728000
> >   Compaction min/max thresholds: 4/32
> >   Read repair chance: 1.0
> >   Replicate on write: true
> >   Column Metadata:
> >     Column Name: refUUID (72656655554944)
> >       Validation Class: org.apache.cassandra.db.marshal.BytesType
> >       Index Name: refUUID_idx
> >       Index Type: KEYS
> >   Compaction Strategy:
> > org.apache.cassandra.db.compaction.LeveledCompactionStrategy
> >   Compression Options:
> >     sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
> >
> > Is it a bug? If not, how can we tune Cassandra to avoid this?
> >
> > Regards,
> >
> > Romain

--------------080304080609020409090904--