cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maki Watanabe <watanabe.m...@gmail.com>
Subject Re: Why so many SSTables?
Date Wed, 11 Apr 2012 02:21:47 GMT
You can configure sstable size by sstable_size_in_mb parameter for LCS.
The default value is 5MB.
You should better to check you don't have many pending compaction tasks
with nodetool tpstats and compactionstats also.
If you have enough IO throughput, you can increase
compaction_throughput_mb_per_sec
in cassandra.yaml to reduce pending compactions.

maki

2012/4/10 Romain HARDOUIN <romain.hardouin@urssaf.fr>:
>
> Hi,
>
> We are surprised by the number of files generated by Cassandra.
> Our cluster consists of 9 nodes and each node handles about 35 GB.
> We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
> We have 30 CF.
>
> We've got roughly 45,000 files under the keyspace directory on each node:
> ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
> 44372
>
> The biggest CF is spread over 38,000 files:
> ls -l Documents* | wc -l
> 37870
>
> ls -l Documents*-Data.db | wc -l
> 7586
>
> Many SSTable are about 4 MB:
>
> 19 MB -> 1 SSTable
> 12 MB -> 2 SSTables
> 11 MB -> 2 SSTables
> 9.2 MB -> 1 SSTable
> 7.0 MB to 7.9 MB -> 6 SSTables
> 6.0 MB to 6.4 MB -> 6 SSTables
> 5.0 MB to 5.4 MB -> 4 SSTables
> 4.0 MB to 4.7 MB -> 7139 SSTables
> 3.0 MB to 3.9 MB -> 258 SSTables
> 2.0 MB to 2.9 MB -> 35 SSTables
> 1.0 MB to 1.9 MB -> 13 SSTables
> 87 KB to  994 KB -> 87 SSTables
> 0 KB -> 32 SSTables
>
> FYI here is CF information:
>
> ColumnFamily: Documents
>   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>   Default column value validator: org.apache.cassandra.db.marshal.BytesType
>   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>   Row cache size / save period in seconds / keys to save : 0.0/0/all
>   Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
>   Key cache size / save period in seconds: 200000.0/14400
>   GC grace seconds: 1728000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 1.0
>   Replicate on write: true
>   Column Metadata:
>     Column Name: refUUID (72656655554944)
>       Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Index Name: refUUID_idx
>       Index Type: KEYS
>   Compaction Strategy:
> org.apache.cassandra.db.compaction.LeveledCompactionStrategy
>   Compression Options:
>     sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
>
> Is it a bug? If not, how can we tune Cassandra to avoid this?
>
> Regards,
>
> Romain

Mime
View raw message