cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Why so many SSTables?
Date Tue, 10 Apr 2012 16:14:23 GMT
LCS explicitly tries to keep sstables under 5MB to minimize extra work
done by compacting data that didn't really overlap across different
levels.

On Tue, Apr 10, 2012 at 9:24 AM, Romain HARDOUIN
<romain.hardouin@urssaf.fr> wrote:
>
> Hi,
>
> We are surprised by the number of files generated by Cassandra.
> Our cluster consists of 9 nodes and each node handles about 35 GB.
> We're using Cassandra 1.0.6 with LeveledCompactionStrategy.
> We have 30 CF.
>
> We've got roughly 45,000 files under the keyspace directory on each node:
> ls -l /var/lib/cassandra/data/OurKeyspace/ | wc -l
> 44372
>
> The biggest CF is spread over 38,000 files:
> ls -l Documents* | wc -l
> 37870
>
> ls -l Documents*-Data.db | wc -l
> 7586
>
> Many SSTable are about 4 MB:
>
> 19 MB -> 1 SSTable
> 12 MB -> 2 SSTables
> 11 MB -> 2 SSTables
> 9.2 MB -> 1 SSTable
> 7.0 MB to 7.9 MB -> 6 SSTables
> 6.0 MB to 6.4 MB -> 6 SSTables
> 5.0 MB to 5.4 MB -> 4 SSTables
> 4.0 MB to 4.7 MB -> 7139 SSTables
> 3.0 MB to 3.9 MB -> 258 SSTables
> 2.0 MB to 2.9 MB -> 35 SSTables
> 1.0 MB to 1.9 MB -> 13 SSTables
> 87 KB to  994 KB -> 87 SSTables
> 0 KB -> 32 SSTables
>
> FYI here is CF information:
>
> ColumnFamily: Documents
>   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>   Default column value validator: org.apache.cassandra.db.marshal.BytesType
>   Columns sorted by: org.apache.cassandra.db.marshal.BytesType
>   Row cache size / save period in seconds / keys to save : 0.0/0/all
>   Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
>   Key cache size / save period in seconds: 200000.0/14400
>   GC grace seconds: 1728000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 1.0
>   Replicate on write: true
>   Column Metadata:
>     Column Name: refUUID (72656655554944)
>       Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Index Name: refUUID_idx
>       Index Type: KEYS
>   Compaction Strategy:
> org.apache.cassandra.db.compaction.LeveledCompactionStrategy
>   Compression Options:
>     sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
>
> Is it a bug? If not, how can we tune Cassandra to avoid this?
>
> Regards,
>
> Romain



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Mime
View raw message