It's easy to spend other people's money, but handling 1TB of data with
1.5 g heap? Memory is cheap, and just a little more will solve many
problems.
On 04/11/2012 08:43 AM, Romain HARDOUIN wrote:
>
> Thank you for your answers.
>
> I originally post this question because we encoutered an OOM Exception
> on 2 nodes during repair session.
> Memory analyzing shows an hotspot: an ArrayList of
> SSTableBoundedScanner which contains as many objects there are
> SSTables on disk (7747 objects at the time).
> This ArrayList consumes 47% of the heap space (786 MB).
>
> We want each node to handle 1 TB, so we must dramatically reduce the
> number of SSTables.
>
> Thus, is there any drawback if we set sstable_size_in_mb to 200MB?
> Otherwise shoudl we go back to Tiered Compaction?
>
> Regards,
>
> Romain
>
>
> Maki Watanabe <watanabe.maki@gmail.com> a écrit sur 11/04/2012 04:21:47 :
>
> > You can configure sstable size by sstable_size_in_mb parameter for LCS.
> > The default value is 5MB.
> > You should better to check you don't have many pending compaction tasks
> > with nodetool tpstats and compactionstats also.
> > If you have enough IO throughput, you can increase
> > compaction_throughput_mb_per_sec
> > in cassandra.yaml to reduce pending compactions.
> >
> > maki
> >
> > 2012/4/10 Romain HARDOUIN <romain.hardouin@urssaf.fr>:
> > >
> > > Hi,
> > >
> > > We are surprised by the number of files generated by Cassandra.
> > > Our cluster consists of 9 nodes and each node handles about 35 GB.
> > > We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
> > > We have 30 CF.
> > >
> > > We've got roughly 45,000 files under the keyspace directory on
> each node:
> > > ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
> > > 44372
> > >
> > > The biggest CF is spread over 38,000 files:
> > > ls -l Documents* | wc -l
> > > 37870
> > >
> > > ls -l Documents*-Data.db | wc -l
> > > 7586
> > >
> > > Many SSTable are about 4 MB:
> > >
> > > 19 MB -> 1 SSTable
> > > 12 MB -> 2 SSTables
> > > 11 MB -> 2 SSTables
> > > 9.2 MB -> 1 SSTable
> > > 7.0 MB to 7.9 MB -> 6 SSTables
> > > 6.0 MB to 6.4 MB -> 6 SSTables
> > > 5.0 MB to 5.4 MB -> 4 SSTables
> > > 4.0 MB to 4.7 MB -> 7139 SSTables
> > > 3.0 MB to 3.9 MB -> 258 SSTables
> > > 2.0 MB to 2.9 MB -> 35 SSTables
> > > 1.0 MB to 1.9 MB -> 13 SSTables
> > > 87 KB to 994 KB -> 87 SSTables
> > > 0 KB -> 32 SSTables
> > >
> > > FYI here is CF information:
> > >
> > > ColumnFamily: Documents
> > > Key Validation Class: org.apache.cassandra.db.marshal.BytesType
> > > Default column value validator:
> org.apache.cassandra.db.marshal.BytesType
> > > Columns sorted by: org.apache.cassandra.db.marshal.BytesType
> > > Row cache size / save period in seconds / keys to save : 0.0/0/all
> > > Row Cache Provider:
> org.apache.cassandra.cache.SerializingCacheProvider
> > > Key cache size / save period in seconds: 200000.0/14400
> > > GC grace seconds: 1728000
> > > Compaction min/max thresholds: 4/32
> > > Read repair chance: 1.0
> > > Replicate on write: true
> > > Column Metadata:
> > > Column Name: refUUID (72656655554944)
> > > Validation Class: org.apache.cassandra.db.marshal.BytesType
> > > Index Name: refUUID_idx
> > > Index Type: KEYS
> > > Compaction Strategy:
> > > org.apache.cassandra.db.compaction.LeveledCompactionStrategy
> > > Compression Options:
> > > sstable_compression:
> org.apache.cassandra.io.compress.SnappyCompressor
> > >
> > > Is it a bug? If not, how can we tune Cassandra to avoid this?
> > >
> > > Regards,
> > >
> > > Romain
|