cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: Why so many SSTables?
Date Wed, 11 Apr 2012 12:59:34 GMT
On Wed, Apr 11, 2012 at 2:43 PM, Romain HARDOUIN
<romain.hardouin@urssaf.fr> wrote:
>
> Thank you for your answers.
>
> I originally post this question because we encoutered an OOM Exception on 2
> nodes during repair session.
> Memory analyzing shows an hotspot: an ArrayList of SSTableBoundedScanner
> which contains as many objects there are SSTables on disk (7747 objects at
> the time).
> This ArrayList consumes 47% of the heap space (786 MB).

That's 101KB per element!! I know Java object representation is not
concise but that feels like more than is reasonable. Are you sure of
those numbers?

In any case, we should improve that so as to not create all those
SSTableBoundedScanner upfront. Would you mind opening a ticket on
https://issues.apache.org/jira/browse/CASSANDRA with as many info as
you have on this.

> We want each node to handle 1 TB, so we must dramatically reduce the number
> of SSTables.
>
> Thus, is there any drawback if we set sstable_size_in_mb to 200MB?
> Otherwise shoudl we go back to Tiered Compaction?
>
> Regards,
>
> Romain
>
>
> Maki Watanabe <watanabe.maki@gmail.com> a écrit sur 11/04/2012 04:21:47 :
>
>
>> You can configure sstable size by sstable_size_in_mb parameter for LCS.
>> The default value is 5MB.
>> You should better to check you don't have many pending compaction tasks
>> with nodetool tpstats and compactionstats also.
>> If you have enough IO throughput, you can increase
>> compaction_throughput_mb_per_sec
>> in cassandra.yaml to reduce pending compactions.
>>
>> maki
>>
>> 2012/4/10 Romain HARDOUIN <romain.hardouin@urssaf.fr>:
>> >
>> > Hi,
>> >
>> > We are surprised by the number of files generated by Cassandra.
>> > Our cluster consists of 9 nodes and each node handles about 35 GB.
>> > We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
>> > We have 30 CF.
>> >
>> > We've got roughly 45,000 files under the keyspace directory on each
>> > node:
>> > ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
>> > 44372
>> >
>> > The biggest CF is spread over 38,000 files:
>> > ls -l Documents* | wc -l
>> > 37870
>> >
>> > ls -l Documents*-Data.db | wc -l
>> > 7586
>> >
>> > Many SSTable are about 4 MB:
>> >
>> > 19 MB -> 1 SSTable
>> > 12 MB -> 2 SSTables
>> > 11 MB -> 2 SSTables
>> > 9.2 MB -> 1 SSTable
>> > 7.0 MB to 7.9 MB -> 6 SSTables
>> > 6.0 MB to 6.4 MB -> 6 SSTables
>> > 5.0 MB to 5.4 MB -> 4 SSTables
>> > 4.0 MB to 4.7 MB -> 7139 SSTables
>> > 3.0 MB to 3.9 MB -> 258 SSTables
>> > 2.0 MB to 2.9 MB -> 35 SSTables
>> > 1.0 MB to 1.9 MB -> 13 SSTables
>> > 87 KB to  994 KB -> 87 SSTables
>> > 0 KB -> 32 SSTables
>> >
>> > FYI here is CF information:
>> >
>> > ColumnFamily: Documents
>> >   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>> >   Default column value validator:
>> > org.apache.cassandra.db.marshal.BytesType
>> >   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>> >   Row cache size / save period in seconds / keys to save : 0.0/0/all
>> >   Row Cache Provider:
>> > org.apache.cassandra.cache.SerializingCacheProvider
>> >   Key cache size / save period in seconds: 200000.0/14400
>> >   GC grace seconds: 1728000
>> >   Compaction min/max thresholds: 4/32
>> >   Read repair chance: 1.0
>> >   Replicate on write: true
>> >   Column Metadata:
>> >     Column Name: refUUID (72656655554944)
>> >       Validation Class: org.apache.cassandra.db.marshal.BytesType
>> >       Index Name: refUUID_idx
>> >       Index Type: KEYS
>> >   Compaction Strategy:
>> > org.apache.cassandra.db.compaction.LeveledCompactionStrategy
>> >   Compression Options:
>> >     sstable_compression:
>> > org.apache.cassandra.io.compress.SnappyCompressor
>> >
>> > Is it a bug? If not, how can we tune Cassandra to avoid this?
>> >
>> > Regards,
>> >
>> > Romain

Mime
View raw message