incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Watanabe Maki <watanabe.m...@gmail.com>
Subject Re: Why so many SSTables?
Date Wed, 11 Apr 2012 23:58:40 GMT
If you increase sstable_size_in_mb to 200MB, you will need more IO for each compaction. For
example, if your memtable will be flushed, and LCS needs to compact it with 10 overwrapped
L1 sstables, you will need almost 2GB read and 2GB write for the single compaction.

From iPhone


On 2012/04/11, at 21:43, Romain HARDOUIN <romain.hardouin@urssaf.fr> wrote:

> 
> Thank you for your answers. 
> 
> I originally post this question because we encoutered an OOM Exception on 2 nodes during
repair session. 
> Memory analyzing shows an hotspot: an ArrayList of SSTableBoundedScanner which contains
as many objects there are SSTables on disk (7747 objects at the time). 
> This ArrayList consumes 47% of the heap space (786 MB). 
> 
> We want each node to handle 1 TB, so we must dramatically reduce the number of SSTables.

> 
> Thus, is there any drawback if we set sstable_size_in_mb to 200MB? 
> Otherwise shoudl we go back to Tiered Compaction? 
> 
> Regards, 
> 
> Romain
> 
> 
> Maki Watanabe <watanabe.maki@gmail.com> a écrit sur 11/04/2012 04:21:47 :
> 
> > You can configure sstable size by sstable_size_in_mb parameter for LCS.
> > The default value is 5MB.
> > You should better to check you don't have many pending compaction tasks
> > with nodetool tpstats and compactionstats also.
> > If you have enough IO throughput, you can increase
> > compaction_throughput_mb_per_sec
> > in cassandra.yaml to reduce pending compactions.
> > 
> > maki
> > 
> > 2012/4/10 Romain HARDOUIN <romain.hardouin@urssaf.fr>:
> > >
> > > Hi,
> > >
> > > We are surprised by the number of files generated by Cassandra.
> > > Our cluster consists of 9 nodes and each node handles about 35 GB.
> > > We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
> > > We have 30 CF.
> > >
> > > We've got roughly 45,000 files under the keyspace directory on each node:
> > > ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
> > > 44372
> > >
> > > The biggest CF is spread over 38,000 files:
> > > ls -l Documents* | wc -l
> > > 37870
> > >
> > > ls -l Documents*-Data.db | wc -l
> > > 7586
> > >
> > > Many SSTable are about 4 MB:
> > >
> > > 19 MB -> 1 SSTable
> > > 12 MB -> 2 SSTables
> > > 11 MB -> 2 SSTables
> > > 9.2 MB -> 1 SSTable
> > > 7.0 MB to 7.9 MB -> 6 SSTables
> > > 6.0 MB to 6.4 MB -> 6 SSTables
> > > 5.0 MB to 5.4 MB -> 4 SSTables
> > > 4.0 MB to 4.7 MB -> 7139 SSTables
> > > 3.0 MB to 3.9 MB -> 258 SSTables
> > > 2.0 MB to 2.9 MB -> 35 SSTables
> > > 1.0 MB to 1.9 MB -> 13 SSTables
> > > 87 KB to  994 KB -> 87 SSTables
> > > 0 KB -> 32 SSTables
> > >
> > > FYI here is CF information:
> > >
> > > ColumnFamily: Documents
> > >   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
> > >   Default column value validator: org.apache.cassandra.db.marshal.BytesType
> > >   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
> > >   Row cache size / save period in seconds / keys to save : 0.0/0/all
> > >   Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
> > >   Key cache size / save period in seconds: 200000.0/14400
> > >   GC grace seconds: 1728000
> > >   Compaction min/max thresholds: 4/32
> > >   Read repair chance: 1.0
> > >   Replicate on write: true
> > >   Column Metadata:
> > >     Column Name: refUUID (72656655554944)
> > >       Validation Class: org.apache.cassandra.db.marshal.BytesType
> > >       Index Name: refUUID_idx
> > >       Index Type: KEYS
> > >   Compaction Strategy:
> > > org.apache.cassandra.db.compaction.LeveledCompactionStrategy
> > >   Compression Options:
> > >     sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
> > >
> > > Is it a bug? If not, how can we tune Cassandra to avoid this?
> > >
> > > Regards,
> > >
> > > Romain

Mime
View raw message